Background

This module builds on code contained in Coronavirus_Statistics_USAF_v005.Rmd. This file includes the latest code for analyzing data from USA Facts. USA Facts maintains data on cases and deaths by county for coronavirus in the US. Downloaded data are unique by county with date as a column and a separate file for each of cases, deaths, and population.

The intent of this code is to source updated functions that allow for readRunUSAFacts() to be run to obtain, read, process, and analyze data from USA Facts.

Sourcing Functions

The tidyverse library is loaded, and the functions used for CDC daily processing are sourced. Additionally, specific functions for USA Facts are also sourced:

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Functions are available in source file
source("./Generic_Added_Utility_Functions_202105_v001.R")
source("./Coronavirus_CDC_Daily_Functions_v001.R")
source("./Coronavirus_USAF_Functions_v001.R")

Further, the mapping file specific to USA Facts is sourced:

source("./Coronavirus_USAF_Default_Mappings_v001.R")

Updated functions for diagnoseClusters(), createDetailedSummaries(), createSummary(), and helperSummaryMap() are added to Coronavirus_USAF_Functions_v001.R. These functions should be checked for consistency with state-level data with just a single copy kept later.

Example Process

The functions are tested on previously downloaded data, with results cached:

urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220308.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220308.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220202")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20220202")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_chkdata_20220308 <- readRunUSAFacts(maxDate="2022-03-06", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220308.csv
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 32
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 6 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 228 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
## 
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220308.csv
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 32
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 44 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 253 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220308_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType    cases new_cases            n
##   <chr>     <dbl>     <dbl>        <dbl>
## 1 before 1.92e+10   7.73e+7 2468189     
## 2 after  1.90e+10   7.69e+7 2428766     
## 3 pctchg 5.39e- 3   5.14e-3       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType  deaths  new_deaths            n
##   <chr>    <dbl>       <dbl>        <dbl>
## 1 before 3.27e+8 951208      2468189     
## 2 after  3.20e+8 907057      2428766     
## 3 pctchg 2.11e-2      0.0464       0.0160

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_chkdata_20220308$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the check file
saveToRDS(cty_chkdata_20220308, ovrWriteError=FALSE)
## 
## File already exists: ./RInputFiles/Coronavirus/cty_chkdata_20220308.RDS 
## 
## Not replacing the existing file since ovrWrite=FALSE
## NULL
# Confirm that it is identical to the previous process
cty_newdata_20220308 <- readFromRDS("cty_newdata_20220308")
# Same names in the list
all.equal(names(cty_chkdata_20220308), names(cty_newdata_20220308))
## [1] TRUE
# Identical items in the list
sapply(names(cty_chkdata_20220308), 
       FUN=function(x) identical(cty_chkdata_20220308[[x]], cty_newdata_20220308[[x]])
       )
##   countyData        dfRaw    dfProcess  dfPerCapita  useClusters      maxDate 
##         TRUE         TRUE         TRUE         TRUE         TRUE         TRUE 
## plotDataList 
##        FALSE
# ggplot2 objects are never identical due to environment; confirm they are all.equal
all.equal(cty_chkdata_20220308$plotDataList, cty_newdata_20220308$plotDataList)
## [1] TRUE

The capability for obtaining and processing county-level vaccines data is included:

# Read the relevant vaccines data
vaxPartialRaw_20220309 <- downloadCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220309.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   Date = col_character(),
##   FIPS = col_character(),
##   Recip_County = col_character(),
##   Recip_State = col_character(),
##   SVI_CTGY = col_character(),
##   Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## Records from other than 50 states and DC:
## # A tibble: 9 x 2
##   state     n
##   <chr> <int>
## 1 AS      442
## 2 FM      447
## 3 GU      897
## 4 MH      434
## 5 MP      444
## 6 PR    35625
## 7 PW      441
## 8 UNK     308
## 9 VI     1794

vaxPartialRaw_20220309
## # A tibble: 1,480,110 x 10
##    date       FIPS  countyName    state vxcpoppct vxcgte18pct vxcgte65pct    pop
##    <date>     <chr> <chr>         <chr>     <dbl>       <dbl>       <dbl>  <dbl>
##  1 2022-03-08 06101 Sutter County CA         60.3        73          87.7  96971
##  2 2022-03-08 12009 Brevard Coun~ FL         64          72.9        88.4 601942
##  3 2022-03-08 19085 Harrison Cou~ IA         51.6        62.1        88.9  14049
##  4 2022-03-08 06069 San Benito C~ CA         66          75.6        83.3  62808
##  5 2022-03-08 19057 Des Moines C~ IA         51.1        61.9        83.4  38967
##  6 2022-03-08 05073 Lafayette Co~ AR         41.2        48.5        57.4   6624
##  7 2022-03-08 16021 Boundary Cou~ ID         34.8        42.8        63.6  12245
##  8 2022-03-08 20073 Greenwood Co~ KS         49          59.6        80.6   5982
##  9 2022-03-08 33005 Cheshire Cou~ NH         61.6        69.1        89.6  76085
## 10 2022-03-08 37159 Rowan County  NC         43.8        52.5        74.1 142088
## # ... with 1,480,100 more rows, and 2 more variables: popgte18 <dbl>,
## #   popgte65 <dbl>
# Repair the data for 65+
vaxFix65_20220309 <- repairVaxPopulation(vaxPartialRaw_20220309, colsRepair=c("popgte65"))
## 
## Count of NA records by column
##           state            FIPS popgte65_minpop popgte65_maxpop    popgte65_nnA 
##               0               0               0               0               0 
##               n 
##               0 
## 
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## #   maxpop <dbl>
vaxFix65_20220309
## # A tibble: 1,417,042 x 10
##    date       FIPS  countyName    state vxcpoppct vxcgte18pct vxcgte65pct    pop
##    <date>     <chr> <chr>         <chr>     <dbl>       <dbl>       <dbl>  <dbl>
##  1 2022-03-08 06101 Sutter County CA         60.3        73          87.7  96971
##  2 2022-03-08 12009 Brevard Coun~ FL         64          72.9        88.4 601942
##  3 2022-03-08 19085 Harrison Cou~ IA         51.6        62.1        88.9  14049
##  4 2022-03-08 06069 San Benito C~ CA         66          75.6        83.3  62808
##  5 2022-03-08 19057 Des Moines C~ IA         51.1        61.9        83.4  38967
##  6 2022-03-08 05073 Lafayette Co~ AR         41.2        48.5        57.4   6624
##  7 2022-03-08 16021 Boundary Cou~ ID         34.8        42.8        63.6  12245
##  8 2022-03-08 20073 Greenwood Co~ KS         49          59.6        80.6   5982
##  9 2022-03-08 33005 Cheshire Cou~ NH         61.6        69.1        89.6  76085
## 10 2022-03-08 37159 Rowan County  NC         43.8        52.5        74.1 142088
## # ... with 1,417,032 more rows, and 2 more variables: popgte18 <dbl>,
## #   popgte65 <dbl>

Correlations data can also be run:

corrList20220309 <- corrVaxBurden(lstCD=cty_newdata_20220308,
                                  dfVax=vaxPartialRaw_20220309, 
                                  minDateCD=c("2021-11-01", "2021-09-01"),
                                  maxDateCD="2022-02-28"
                                  )
## 
## Will run with parameters:
## burdenVar: cpm dpm 
## vaxVar: vxcpoppct vxcpoppct 
## minDateCD: 2021-11-01 2021-09-01 
## maxDateCD: 2022-02-28 2022-02-28
## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -57553717  -2611410   -275030   2650218 123342871 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 68194.65    1887.85   36.12   <2e-16 ***
## vaxMetric     530.27      33.93   15.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7406000 on 3140 degrees of freedom
## Multiple R-squared:  0.07217,    Adjusted R-squared:  0.07187 
## F-statistic: 244.2 on 1 and 3140 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -57474876  -2595922   -291957   2584956 122543573 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                58253.65    6207.86   9.384  < 2e-16 ***
## type>500k               66890.28    3631.97  18.417  < 2e-16 ***
## type100k-500k           72662.44    3757.76  19.337  < 2e-16 ***
## type25k-100k            65791.22    4409.38  14.921  < 2e-16 ***
## vaxMetric:type<25k        731.64     145.96   5.013 5.67e-07 ***
## vaxMetric:type>500k       553.88      60.14   9.210  < 2e-16 ***
## vaxMetric:type100k-500k   429.71      69.68   6.167 7.84e-10 ***
## vaxMetric:type25k-100k    628.19      96.06   6.540 7.17e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7400000 on 3134 degrees of freedom
## Multiple R-squared:  0.9476, Adjusted R-squared:  0.9474 
## F-statistic:  7081 on 8 and 3134 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'

## `geom_smooth()` using formula 'y ~ x'

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -1371927   -23067    44651   121402   729890 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1254.2057    22.8836   54.81   <2e-16 ***
## vaxMetric     -9.5890     0.4779  -20.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 160100 on 3140 degrees of freedom
## Multiple R-squared:  0.1136, Adjusted R-squared:  0.1134 
## F-statistic: 402.6 on 1 and 3140 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -885434  -65276   -4779   67115 1060610 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## type<25k                1760.1655    74.0739  23.762  < 2e-16 ***
## type>500k                758.0391    29.0606  26.085  < 2e-16 ***
## type100k-500k           1304.8595    41.8238  31.199  < 2e-16 ***
## type25k-100k            1668.2572    50.2491  33.200  < 2e-16 ***
## vaxMetric:type<25k       -11.1910     2.0925  -5.348 9.52e-08 ***
## vaxMetric:type>500k       -3.2769     0.5600  -5.851 5.38e-09 ***
## vaxMetric:type100k-500k   -9.4838     0.8902 -10.653  < 2e-16 ***
## vaxMetric:type25k-100k   -11.0196     1.2902  -8.541  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 140000 on 3134 degrees of freedom
## Multiple R-squared:  0.8063, Adjusted R-squared:  0.8058 
## F-statistic:  1631 on 8 and 3134 DF,  p-value: < 2.2e-16
corrList20220309
## [[1]]
## # A tibble: 3,142 x 10
##    fips  countyName     pop state vxcpoppct vxcgte18pct vxcgte65pct    cpm   dpm
##    <chr> <chr>        <dbl> <chr>     <dbl>       <dbl>       <dbl>  <dbl> <dbl>
##  1 42129 Westmorela~ 348899 PA         54.2        63.2        81.2 96676. 1238.
##  2 39109 Miami Coun~ 106987 OH         42.8        53          77.6 91619. 1318.
##  3 16045 Gem County   18112 ID         34.5        44.7        71   44611.  939.
##  4 05071 Johnson Co~  26578 AR         44.4        54.8        67.7 74761.  452.
##  5 48411 San Saba C~   6055 TX         32.7        40          54.4 38811.  991.
##  6 51710 Norfolk ci~ 242742 VA         48.2        56.7        79.4 69044.  470.
##  7 27007 Beltrami C~  47188 MN         52.5        64.7        91.4 94982.  805.
##  8 05067 Jackson Co~  16719 AR         34.4        41.2        66.9 98870.  658.
##  9 51735 Poquoson c~  12271 VA         43.5        52.8        69.7 79945.  652.
## 10 47047 Fayette Co~  41133 TN         49.2        57.9        76.3 97610. 1386.
## # ... with 3,132 more rows, and 1 more variable: type <chr>
## 
## [[2]]
## # A tibble: 3,142 x 10
##    fips  countyName     pop state vxcpoppct vxcgte18pct vxcgte65pct    cpm   dpm
##    <chr> <chr>        <dbl> <chr>     <dbl>       <dbl>       <dbl>  <dbl> <dbl>
##  1 51710 Norfolk ci~ 242742 VA         39.4        46.7        68.7 8.43e4  704.
##  2 01127 Walker Cou~  63521 AL         32.7        41          68.1 1.46e5 1810.
##  3 28153 Wayne Coun~  20183 MS         27.1        34.7        58.2 1.06e5 1288.
##  4 26045 Eaton Coun~ 110268 MI         47.4        56.5        77.4 1.41e5 1533.
##  5 48313 Madison Co~  14284 TX          0           0           0   1.00e5 1050.
##  6 39095 Lucas Coun~ 428348 OH         39.2        48.2        67.2 1.22e5 1158.
##  7 29153 Ozark Coun~   9174 MO         24.7        30          41   7.60e4 3052.
##  8 29071 Franklin C~ 103967 MO         44.5        55          81.4 9.62e4 1212.
##  9 21149 McLean Cou~   9207 KY         40.9        51.6        76.7 1.56e5 1629.
## 10 13233 Polk County  42613 GA          9.4        11.7         9.8 9.84e4 1877.
## # ... with 3,132 more rows, and 1 more variable: type <chr>

Comparisons can be run between summed county and state data:

statePerCapita <- readFromRDS("cdc_daily_220304")$dfPerCapita

tempStateCompareList <- compareStateSummedCounty(dfState=statePerCapita, 
                                                 dfCounty=cty_newdata_20220308$dfPerCapita, 
                                                 aggData=TRUE,
                                                 dateThru="2022-02-28", 
                                                 returnData=TRUE
                                                 )
## Warning: Removed 6 row(s) containing missing values (geom_path).

tempStateCompareList
## $dfState
## # A tibble: 3,160 x 6
##    date       name       src   value value7 state     
##    <date>     <chr>      <chr> <dbl>  <dbl> <chr>     
##  1 2020-01-01 cases      state     0     NA Aggregated
##  2 2020-01-01 deaths     state     0     NA Aggregated
##  3 2020-01-01 new_cases  state     0     NA Aggregated
##  4 2020-01-01 new_deaths state     0     NA Aggregated
##  5 2020-01-02 cases      state     0     NA Aggregated
##  6 2020-01-02 deaths     state     0     NA Aggregated
##  7 2020-01-02 new_cases  state     0     NA Aggregated
##  8 2020-01-02 new_deaths state     0     NA Aggregated
##  9 2020-01-03 cases      state     0     NA Aggregated
## 10 2020-01-03 deaths     state     0     NA Aggregated
## # ... with 3,150 more rows
## 
## $dfCounty
## # A tibble: 3,052 x 6
##    date       name       src    value  value7 state     
##    <date>     <chr>      <chr>  <dbl>   <dbl> <chr>     
##  1 2020-01-25 cases      county   751 750.    Aggregated
##  2 2020-01-25 deaths     county     1   1     Aggregated
##  3 2020-01-25 new_cases  county    10 111.    Aggregated
##  4 2020-01-25 new_deaths county     0   0.143 Aggregated
##  5 2020-01-26 cases      county   759 758.    Aggregated
##  6 2020-01-26 deaths     county     1   1     Aggregated
##  7 2020-01-26 new_cases  county     8   8     Aggregated
##  8 2020-01-26 new_deaths county     0   0     Aggregated
##  9 2020-01-27 cases      county   769 766.    Aggregated
## 10 2020-01-27 deaths     county     1   1     Aggregated
## # ... with 3,042 more rows

The scoring metric is converted to functional form:

# Data for score similarity process
tempStateCompareList_v2 <- compareStateSummedCounty(dfState=statePerCapita, 
                                                    dfCounty=cty_newdata_20220308$dfPerCapita, 
                                                    inclStates=c(state.abb, "DC"), 
                                                    dateThru="2022-02-28", 
                                                    makePlot=FALSE,
                                                    returnData=TRUE
                                                    )
scoreSimilarity(tempStateCompareList_v2, minDate="2020-02-15", maxDate="2022-02-15", makeFacet=FALSE)

# Example states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=statePerCapita, 
                         dfCounty=cty_newdata_20220308$dfPerCapita, 
                         inclStates=c("FL", "MO", "OK", "TX", "ME", "NE", "KY", "AL", "GA"), 
                         dateThru="2022-02-28", 
                         makePlot=TRUE,
                         returnData=FALSE
)
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

While cumulative deaths and cumulative cases are generally well aligned between sources, rolling 7-day deaths and cases are frequently divergent by source.

An integrated vaccines dataset can be created:

allState_20220309 <- integrateStateVaccine(vaxFix65_20220309, statePerCap=statePerCapita)
allState_20220309
## # A tibble: 23,001 x 11
##    state date       vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
##    <chr> <date>         <dbl>       <dbl>       <dbl>     <dbl>       <dbl>
##  1 AK    2020-12-13        NA          NA          NA         0           0
##  2 AL    2020-12-13        NA          NA          NA         0           0
##  3 AR    2020-12-13        NA          NA          NA         0           0
##  4 AZ    2020-12-13        NA          NA          NA         0           0
##  5 CA    2020-12-13        NA          NA          NA         0           0
##  6 CO    2020-12-13        NA          NA          NA         0           0
##  7 CT    2020-12-13        NA          NA          NA         0           0
##  8 DC    2020-12-13        NA          NA          NA         0           0
##  9 DE    2020-12-13        NA          NA          NA         0           0
## 10 FL    2020-12-13        NA          NA          NA         0           0
## # ... with 22,991 more rows, and 4 more variables: ctygte65pct <dbl>,
## #   pop <dbl>, popgte18 <dbl>, popgte65 <dbl>

Functionality for exploring vaccine evolution is included:

# Example for a single state
stateAgeVaxEvolution(allState_20220309, keyState="FL", minDate="2020-12-15", returnData=TRUE)
## Warning: Removed 5 row(s) containing missing values (geom_path).

## # A tibble: 2,694 x 6
##    state date       name        value src           age  
##    <chr> <date>     <chr>       <dbl> <chr>         <chr>
##  1 FL    2020-12-15 vxcpoppct       0 State         All  
##  2 FL    2020-12-15 vxcgte18pct     0 State         18+  
##  3 FL    2020-12-15 vxcgte65pct     0 State         65+  
##  4 FL    2020-12-15 ctypoppct       0 Summed County All  
##  5 FL    2020-12-15 ctygte18pct     0 Summed County 18+  
##  6 FL    2020-12-15 ctygte65pct     0 Summed County 65+  
##  7 FL    2020-12-16 vxcpoppct       0 State         All  
##  8 FL    2020-12-16 vxcgte18pct     0 State         18+  
##  9 FL    2020-12-16 vxcgte65pct     0 State         65+  
## 10 FL    2020-12-16 ctypoppct       0 Summed County All  
## # ... with 2,684 more rows
# Example for multiple states without plotting
stateAgeVaxEvolution(allState_20220309, keyState=c("CA", "FL", "TX", "NY", "PA", "IL"), createPlot=FALSE)
## # A tibble: 16,236 x 6
##    state date       name        value src           age  
##    <chr> <date>     <chr>       <dbl> <chr>         <chr>
##  1 CA    2020-12-13 vxcpoppct      NA State         All  
##  2 CA    2020-12-13 vxcgte18pct    NA State         18+  
##  3 CA    2020-12-13 vxcgte65pct    NA State         65+  
##  4 CA    2020-12-13 ctypoppct       0 Summed County All  
##  5 CA    2020-12-13 ctygte18pct     0 Summed County 18+  
##  6 CA    2020-12-13 ctygte65pct     0 Summed County 65+  
##  7 FL    2020-12-13 vxcpoppct      NA State         All  
##  8 FL    2020-12-13 vxcgte18pct    NA State         18+  
##  9 FL    2020-12-13 vxcgte65pct    NA State         65+  
## 10 FL    2020-12-13 ctypoppct       0 Summed County All  
## # ... with 16,226 more rows
# Example for multiple states with plotting
stateAgeVaxEvolution(allState_20220309, keyState=c("CT", "AR", "AZ"), minDate="2020-12-15", returnData=TRUE)
## Warning: Removed 5 row(s) containing missing values (geom_path).

## # A tibble: 8,082 x 6
##    state date       name        value src           age  
##    <chr> <date>     <chr>       <dbl> <chr>         <chr>
##  1 AR    2020-12-15 vxcpoppct       0 State         All  
##  2 AR    2020-12-15 vxcgte18pct     0 State         18+  
##  3 AR    2020-12-15 vxcgte65pct     0 State         65+  
##  4 AR    2020-12-15 ctypoppct       0 Summed County All  
##  5 AR    2020-12-15 ctygte18pct     0 Summed County 18+  
##  6 AR    2020-12-15 ctygte65pct     0 Summed County 65+  
##  7 AZ    2020-12-15 vxcpoppct       0 State         All  
##  8 AZ    2020-12-15 vxcgte18pct     0 State         18+  
##  9 AZ    2020-12-15 vxcgte65pct     0 State         65+  
## 10 AZ    2020-12-15 ctypoppct       0 Summed County All  
## # ... with 8,072 more rows

Scores can be created for every state, reflecting differences in the vaccination data:

scoreVaxSimilarity(allState_20220309)

scoreVaxSimilarity(allState_20220309, minDate="2021-12-01", maxDate="2022-02-28", returnBaseData=TRUE)

## # A tibble: 459 x 7
##    state ym      age       n  rmse cdcState ctySum
##    <chr> <chr>   <chr> <int> <dbl>    <dbl>  <dbl>
##  1 AK    2021-12 18+      31  2.30     67.2   65.0
##  2 AK    2021-12 65+      31  2.25     83.9   81.7
##  3 AK    2021-12 All      31  1.84     55.6   53.7
##  4 AK    2022-01 18+      31  2.48     68.6   66.2
##  5 AK    2022-01 65+      31  2.35     84.4   82.1
##  6 AK    2022-01 All      31  1.98     57.3   55.4
##  7 AK    2022-02 18+      28  1.88     71.4   69.6
##  8 AK    2022-02 65+      28  1.86     85.0   83.2
##  9 AK    2022-02 All      28  1.48     60.2   58.7
## 10 AL    2021-12 18+      31  4.26     57.2   53.0
## # ... with 449 more rows
stateAgeVaxEvolution(allState_20220309, 
                     keyState=c("HI", "TX", "VA", "GA", "CO", "WV", "VT"), 
                     createPlot = TRUE
                     )
## Warning: Removed 6 row(s) containing missing values (geom_path).

County-level burden process mapping is included:

dfRoll20220308 <- createBurdenCountyDate(cty_newdata_20220308, 
                                         maxDate="2022-02-28", 
                                         rollBy=months(c(0, -3, -6, -9)), 
                                         dateSpan=91
                                         )
dfRoll20220308
## # A tibble: 12,568 x 7
##    countyFIPS state asofDate    tdpm    tcpm dpm91  cpm91
##    <chr>      <chr> <date>     <dbl>   <dbl> <dbl>  <dbl>
##  1 01001      AL    2022-02-28 3472. 277614.  662. 89370.
##  2 01003      AL    2022-02-28 2867. 246280.  228. 75674.
##  3 01005      AL    2022-02-28 3767. 220570.  527. 70890.
##  4 01007      AL    2022-02-28 4421. 284674.  223. 90873.
##  5 01009      AL    2022-02-28 3770. 255249.  450. 69917.
##  6 01011      AL    2022-02-28 4851. 225621.  396. 74547.
##  7 01013      AL    2022-02-28 6119. 258896.  977. 82271.
##  8 01015      AL    2022-02-28 5176. 278491.  616. 79680.
##  9 01017      AL    2022-02-28 4571. 253533.  301. 79599.
## 10 01019      AL    2022-02-28 2978. 193694.  573. 72377.
## # ... with 12,558 more rows
makeBurdenDatePlot(dfRoll20220308, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000)

makeBurdenDatePlot(dfRoll20220308, keyVar="dpm91", timeLabel="3-month", varCeiling=1500)

The latest county-level burden data are downloaded:

urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220414.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220414.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220308")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20220308")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_newdata_20220414 <- readRunUSAFacts(maxDate="2022-04-12", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 38
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 1 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 122 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 38
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 1 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 118 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType    cases new_cases            n
##   <chr>     <dbl>     <dbl>        <dbl>
## 1 before 2.21e+10   7.84e+7 2589523     
## 2 after  2.20e+10   7.80e+7 2548162     
## 3 pctchg 5.36e- 3   5.01e-3       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType  deaths  new_deaths            n
##   <chr>    <dbl>       <dbl>        <dbl>
## 1 before 3.64e+8 978846      2589523     
## 2 after  3.55e+8 930660      2548162     
## 3 pctchg 2.39e-2      0.0492       0.0160

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20220414$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the check file
saveToRDS(cty_newdata_20220414, ovrWriteError=FALSE)

A function is included for reading and fixing vaccines data, as well as running correlations to the burden data:

processCountyVaccines <- function(loc,
                                  ctyList,
                                  url="https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD", 
                                  colsRepair=c("popgte65"), 
                                  ...
                                  ) {
    
    # FUNCTION ARGUMENTS:
    # loc: the location of the downloaded vaccines data
    # ctyList: processed list file of county-level burden data
    # url: the location for obtaining the vaccines data
    # colsRepair: columns in the raw vaccines data that require repairs to the population data
    # ...: arguments passed to corrVaxBurden()
    
    vaxRaw <- downloadCountyVaccines(loc=loc, url=url)
    vaxFix <- repairVaxPopulation(vaxRaw, colsRepair=colsRepair)
    corrList <- corrVaxBurden(lstCD=ctyList, dfVax=vaxRaw, ...)
    
    # Return the key items
    list(vaxRaw=vaxRaw, vaxFix=vaxFix, corrList=corrList)
    
}

cty_vaxdata_20220415 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220415.csv", 
                                              ctyList=cty_newdata_20220414, 
                                              minDateCD=c("2021-11-01", "2021-09-01"),
                                              maxDateCD="2022-03-31"
                                              )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   Date = col_character(),
##   FIPS = col_character(),
##   Recip_County = col_character(),
##   Recip_State = col_character(),
##   SVI_CTGY = col_character(),
##   Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## Records from other than 50 states and DC:
## # A tibble: 10 x 2
##    state     n
##    <chr> <int>
##  1 AS      479
##  2 FM      484
##  3 GU      971
##  4 MH      471
##  5 MP      481
##  6 PR    38548
##  7 PW      478
##  8 UNK     308
##  9 VI     1942
## 10 <NA>     11

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## 
## Count of NA records by column
##           state            FIPS popgte65_minpop popgte65_maxpop    popgte65_nnA 
##               0               0               0               0               0 
##               n 
##               0 
## 
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## #   maxpop <dbl>
## 
## 
## 
## Will run with parameters:
## burdenVar: cpm dpm 
## vaxVar: vxcpoppct vxcpoppct 
## minDateCD: 2021-11-01 2021-09-01 
## maxDateCD: 2022-03-31 2022-03-31
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).

## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -60027619  -2527753   -206065   2924501 119721627 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 66910.18    2015.81   33.19   <2e-16 ***
## vaxMetric     609.76      36.14   16.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7481000 on 3132 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.08332,    Adjusted R-squared:  0.08303 
## F-statistic: 284.7 on 1 and 3132 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -57742661  -2581953   -363789   2682004 116556752 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                59153.51    6320.38   9.359  < 2e-16 ***
## type>500k               57669.68    4189.22  13.766  < 2e-16 ***
## type100k-500k           71563.60    4088.55  17.503  < 2e-16 ***
## type25k-100k            66216.82    4507.32  14.691  < 2e-16 ***
## vaxMetric:type<25k        796.58     148.48   5.365 8.70e-08 ***
## vaxMetric:type>500k       757.00      69.15  10.947  < 2e-16 ***
## vaxMetric:type100k-500k   507.44      75.59   6.713 2.25e-11 ***
## vaxMetric:type25k-100k    691.21      98.12   7.045 2.28e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7467000 on 3126 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.9498, Adjusted R-squared:  0.9496 
## F-statistic:  7388 on 8 and 3126 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -1659513   -23948    48535   133724   882883 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1426.1342    25.7429    55.4   <2e-16 ***
## vaxMetric    -11.6387     0.5363   -21.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 177100 on 3132 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.1307, Adjusted R-squared:  0.1305 
## F-statistic:   471 on 1 and 3132 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -928924  -70829   -5885   74620 1385207 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## type<25k                1963.9575    82.0871  23.925  < 2e-16 ***
## type>500k                865.1035    32.7854  26.387  < 2e-16 ***
## type100k-500k           1475.9119    47.3444  31.174  < 2e-16 ***
## type25k-100k            1865.0472    55.7475  33.455  < 2e-16 ***
## vaxMetric:type<25k       -12.4743     2.3170  -5.384 7.83e-08 ***
## vaxMetric:type>500k       -4.4739     0.6299  -7.102 1.51e-12 ***
## vaxMetric:type100k-500k  -11.4038     1.0048 -11.350  < 2e-16 ***
## vaxMetric:type25k-100k   -12.6591     1.4304  -8.850  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 154600 on 3126 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.8049, Adjusted R-squared:  0.8044 
## F-statistic:  1612 on 8 and 3126 DF,  p-value: < 2.2e-16

Similarities between summed county-level data and state-level data are explored:

# Data for score similarity process
tempStateCompareList_v2 <- compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita, 
                                                    dfCounty=cty_newdata_20220414$dfPerCapita, 
                                                    inclStates=c(state.abb, "DC"), 
                                                    dateThru="2022-04-10", 
                                                    makePlot=FALSE,
                                                    returnData=TRUE
                                                    )
scoreSimilarity(tempStateCompareList_v2, minDate="2020-02-15", maxDate="2022-04-10", makeFacet=FALSE)

# Example states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita, 
                         dfCounty=cty_newdata_20220414$dfPerCapita, 
                         inclStates=c("GA", "FL", "NE", "OK", "KY", "ME", "VT", "WY", "TN", "RI", "MO", "MA", "KS"), 
                         dateThru="2022-04-10", 
                         makePlot=TRUE,
                         returnData=FALSE
                         )
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

An integrated vaccines dataset is created, along with similarity scores:

allState_20220415 <- integrateStateVaccine(cty_vaxdata_20220415$vaxFix, 
                                           statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita
                                           )
allState_20220415
## # A tibble: 24,888 x 11
##    state date       vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
##    <chr> <date>         <dbl>       <dbl>       <dbl>     <dbl>       <dbl>
##  1 AK    2020-12-13        NA          NA          NA         0           0
##  2 AL    2020-12-13        NA          NA          NA         0           0
##  3 AR    2020-12-13        NA          NA          NA         0           0
##  4 AZ    2020-12-13        NA          NA          NA         0           0
##  5 CA    2020-12-13        NA          NA          NA         0           0
##  6 CO    2020-12-13        NA          NA          NA         0           0
##  7 CT    2020-12-13        NA          NA          NA         0           0
##  8 DC    2020-12-13        NA          NA          NA         0           0
##  9 DE    2020-12-13        NA          NA          NA         0           0
## 10 FL    2020-12-13        NA          NA          NA         0           0
## # ... with 24,878 more rows, and 4 more variables: ctygte65pct <dbl>,
## #   pop <dbl>, popgte18 <dbl>, popgte65 <dbl>
vaxDiff_20220415 <- scoreVaxSimilarity(allState_20220415, 
                                       minDate="2021-04-01", 
                                       maxDate="2022-03-31", 
                                       returnBaseData=TRUE
                                       )

Plots of the larger differences are included:

stateAgeVaxEvolution(allState_20220415, 
                     keyState=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "WV"), 
                     createPlot = TRUE
                     )
## Warning: Removed 1 row(s) containing missing values (geom_path).

scoreVaxSimilarity(allState_20220415, minDate="2022-01-01", maxDate="2022-03-31")

stateAgeVaxEvolution(allState_20220415, keyState=c(state.abb, "DC"), createPlot = FALSE) %>% 
    filter(!complete.cases(.), date >= "2021-01-01") %>% 
    group_by(state, ym=customYYYYMM(date)) %>% 
    summarize(n=n()) %>% 
    pivot_wider(ym, names_from="state", values_from="n") %>% 
    arrange(desc(ym))
## `summarise()` has grouped output by 'state'. You can override using the `.groups` argument.
## # A tibble: 16 x 6
##    ym         CA    HI    MA    TX    VA
##    <chr>   <int> <int> <int> <int> <int>
##  1 2022-04    42    42    42    NA    NA
##  2 2022-03    48    93    93    24    24
##  3 2022-02    NA    84    84    NA    NA
##  4 2022-01    NA    93    93    NA    NA
##  5 2021-12    NA    93    93    NA    NA
##  6 2021-11    NA    90    90    NA    NA
##  7 2021-10    NA    93    93    NA    NA
##  8 2021-09    NA    90    90    NA    NA
##  9 2021-08    NA    93    93    NA    NA
## 10 2021-07    NA    93    93    NA    NA
## 11 2021-06    NA    90    90    NA    NA
## 12 2021-05    NA    93    93    NA    NA
## 13 2021-04    NA    90    90    NA    NA
## 14 2021-03    NA    93    93    NA    NA
## 15 2021-02    NA    84    84    NA    NA
## 16 2021-01    NA    93    93    NA    NA

Function integrateStateVaccine() is updated to better manage sporadic NA values such as in TX and VA:

integrateStateVaccine <- function(vaxCounty, 
                                  statePerCap,
                                  keyStates=c(state.abb, "DC"),
                                  joinType=dplyr::right_join,
                                  treatNAZero=TRUE
                                  ) {
    
    # FUNCTION ARGUMENTS:
    # vaxCounty: processed county-level vaccines data
    # statePerCap: per-capita state level data frame
    # keyStates: states to be included
    # joinType: join to be made (statePerCap is 'left' and summed vaxCounty is 'right')
    # treatNAZero: boolean, should NA values in vaccines be treated as 0 rather than throwing an aggregate NA?
    
    # Create the aggregation function
    if(isTRUE(treatNAZero)) aggFunc <- specNA(sum) else aggFunc <- sum
    
    # Roll county-level data to state
    dfTemp <- vaxCounty %>%
        filter(state %in% all_of(keyStates), FIPS != "UNK") %>%
        group_by(date, state) %>%
        summarize(ctypoppct=aggFunc(vxcpoppct*pop)/sum(pop), 
                  ctygte18pct=aggFunc(vxcgte18pct*popgte18)/sum(popgte18),
                  ctygte65pct=aggFunc(vxcgte65pct*popgte65)/sum(popgte65),
                  across(starts_with("pop"), sum, na.rm=TRUE), 
                  .groups="drop"
        )
    
    # Integrate and return the data
    statePerCap %>%
        select(state, date, vxcpoppct, vxcgte18pct, vxcgte65pct) %>%
        filter(state %in% all_of(keyStates)) %>%
        joinType(dfTemp, by=c("state", "date"))
    
}

# Check that data are the same
all.equal(allState_20220415, 
          integrateStateVaccine(cty_vaxdata_20220415$vaxFix, 
                                statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita, 
                                treatNAZero=FALSE
                                )
          )
## [1] TRUE
# Check differences in updated data
allState_20220415_v2 <- integrateStateVaccine(cty_vaxdata_20220415$vaxFix, 
                                              statePerCap=readFromRDS("cdc_daily_220416")$dfPerCapita, 
                                              treatNAZero=TRUE
                                              )

# Confirm equality where not NA in original data
all.equal(allState_20220415[complete.cases(allState_20220415), ], 
          allState_20220415_v2[complete.cases(allState_20220415), ]
          )
## [1] TRUE
# Display updated records
allState_20220415_v2[complete.cases(allState_20220415_v2) & !complete.cases(allState_20220415), ] %>%
    mutate(ym=customYYYYMM(date)) %>%
    filter(ym >= "2021-01-01") %>%
    group_by(state, ym) %>%
    summarize(across(where(is.numeric), mean), n=n(), .groups="drop")
## # A tibble: 19 x 12
##    state ym      vxcpoppct vxcgte18pct vxcgte65pct ctypoppct ctygte18pct
##    <chr> <chr>       <dbl>       <dbl>       <dbl>     <dbl>       <dbl>
##  1 CA    2022-03      71.0        80.6        89.0     69.2        78.4 
##  2 CA    2022-04      71.3        80.9        89.4     69.5        78.6 
##  3 MA    2021-02       0           0           0        3.72        4.62
##  4 MA    2021-03      12.2        15.1        32.7     11.8        14.7 
##  5 MA    2021-04      27.7        34.3        68.9     24.6        30.5 
##  6 MA    2021-05      46.0        56.4        80.9     41.0        50.2 
##  7 MA    2021-06      58.2        69.5        85.0     52.2        61.9 
##  8 MA    2021-07      62.9        73.6        86.6     56.1        65.4 
##  9 MA    2021-08      64.9        75.6        87.4     57.9        67.2 
## 10 MA    2021-09      67.0        77.7        88.4     59.9        69.1 
## 11 MA    2021-10      68.8        79.6        89.5     61.5        70.8 
## 12 MA    2021-11      70.4        81.4        91.2     63.0        72.4 
## 13 MA    2021-12      73.3        82.9        92.4     65.7        73.8 
## 14 MA    2022-01      75.6        84.2        93.1     67.7        74.9 
## 15 MA    2022-02      77.1        85.2        93.5     69.1        75.8 
## 16 MA    2022-03      77.9        85.8        93.7     69.9        76.3 
## 17 MA    2022-04      78.4        86.1        94.3     70.2        76.6 
## 18 TX    2022-03      60.6        71.8        86.2     59.9        70.7 
## 19 VA    2022-03      72.4        81.5        91.4     57.0        65.3 
## # ... with 5 more variables: ctygte65pct <dbl>, pop <dbl>, popgte18 <dbl>,
## #   popgte65 <dbl>, n <int>
scoreVaxSimilarity(allState_20220415_v2, minDate="2022-01-01", maxDate="2022-03-31")

County-level burden process mapping is included:

dfRoll20220414 <- createBurdenCountyDate(cty_newdata_20220414, 
                                         maxDate="2022-03-31", 
                                         rollBy=months(c(0, -3, -6, -9)), 
                                         dateSpan=91
                                         )
dfRoll20220414
## # A tibble: 12,568 x 7
##    countyFIPS state asofDate    tdpm    tcpm dpm91  cpm91
##    <chr>      <chr> <date>     <dbl>   <dbl> <dbl>  <dbl>
##  1 01001      AL    2022-03-31 3777. 280209.  913. 84161.
##  2 01003      AL    2022-03-31 3024. 248233.  367. 70917.
##  3 01005      AL    2022-03-31 3929. 229118.  648. 74415.
##  4 01007      AL    2022-03-31 4510. 286595.  268. 85023.
##  5 01009      AL    2022-03-31 4081. 257670.  657. 64106.
##  6 01011      AL    2022-03-31 5148. 227997.  594. 64449.
##  7 01013      AL    2022-03-31 6582. 259718. 1337. 75072.
##  8 01015      AL    2022-03-31 5431. 284785.  748. 80199.
##  9 01017      AL    2022-03-31 4842. 254646.  421. 70578.
## 10 01019      AL    2022-03-31 3245. 195144.  763. 66728.
## # ... with 12,558 more rows
makeBurdenDatePlot(dfRoll20220414, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000)

makeBurdenDatePlot(dfRoll20220414, keyVar="dpm91", timeLabel="3-month", varCeiling=1500)

Post-processing steps are consolidated to a function, so that the process includes:

  1. readRunUSAFacts() and spareCountyClusterMap()
  2. processCountyVaccines()
  3. postProcessCountyData()
  4. Individual functions as needed

The postProcessCountyData() includes:

# Updated to include date range
scoreVaxSimilarity <- function(df, 
                               keyStates=c(state.abb, "DC"), 
                               minDate=NULL, 
                               maxDate=NULL, 
                               returnBaseData=FALSE
                               ) {
    
    # FUNCTION ARGUMENTS:
    # df: a processed file from integrateStateVaccine()
    # keyStates: states to include
    # minDate: earliest date to use for scoring (NULL means use all)
    # maxDate: latest date to use for scoring (NULL means use all)
    # returnBaseData: boolean, should dfBase be returned?
    
    # Set minDate and maxDate if NULL
    if(is.null(minDate)) minDate <- min(df$date, na.rm=TRUE)
    if(is.null(maxDate)) maxDate <- max(df$date, na.rm=TRUE)
    
    dfBase <- stateAgeVaxEvolution(df, keyState=keyStates, createPlot = FALSE) %>%
        mutate(value=ifelse(is.na(value), 0, value), src=ifelse(src=="State", "cdcState", "ctySum")) %>%
        pivotData(c("state", "date", "age"), nameVar="src", toLonger=FALSE) %>%
        filter(date >= minDate, date <= maxDate) %>%
        mutate(ym=customYYYYMM(date)) %>%
        group_by(state, ym, age) %>%
        summarize(n=n(), 
                  rmse=sqrt(mean((cdcState-ctySum)**2)), 
                  cdcState=mean(cdcState), 
                  ctySum=mean(ctySum), 
                  .groups="drop"
        )
    p1 <- dfBase %>%
        group_by(state, age) %>% 
        summarize(rmse=mean(rmse), .groups="drop") %>% 
        ggplot(aes(x=fct_reorder(state, rmse), y=rmse)) + 
        geom_col(fill="lightblue") + 
        coord_flip() + 
        facet_wrap(~age) + 
        labs(x=NULL, 
             y="RMSE (Vaccinated by State/Age difference by source)", 
             title="Difference in vaccinated by state data by source", 
             subtitle=paste0("Date range: ", minDate, " to ", maxDate)
        )
    print(p1)
    
    if(isTRUE(returnBaseData)) return(dfBase)
    
}

# Updated to handle lst passed as a perCapita file
makeBurdenSummary <- function(lst, 
                              groupVar=c("countyFIPS", "state"), 
                              numVarFinal=c("tdpm", "tcpm"), 
                              numVarSum=c("dpm", "cpm"), 
                              keyDate=NULL,
                              dateRange=28
                              ) {
    
    # FUNCTION ARGUMENTS
    # lst: list of processed county burden data (or extracted dfPerCapita file)
    # groupVar: grouping variables for the final dataset
    # numVarFinal: numeric variables to pull data from the key date
    # numVarSum: numeric variables to sum from the key date interval
    # keyDate: the key date for the summaries (NULL means use maximum in data)
    # dateRange: number of days to include in the numeric interval summaries
    
    # Extract perCapita data if passed as a list
    if("list" %in% class(lst)) lst <- lst[["dfPerCapita"]]
    
    # Find keyDate if not provided, convert to date if not already
    if(is.null(keyDate)) keyDate <- lst %>% pull(date) %>% max()
    if(!("Date" %in% class(keyDate))) keyDate <- as.Date(keyDate)
    
    # Create summary
    df <- lst %>% 
        group_by_at(all_of(groupVar)) %>%
        summarize(asofDate=keyDate, 
                  across(all_of(numVarFinal), .fns=~sum(ifelse(date==keyDate, .x, 0))),
                  across(all_of(numVarSum), 
                         .fns=~sum(ifelse(date>keyDate-dateRange & date<=keyDate, .x, 0)), 
                         .names=paste0("{.col}", as.character(dateRange))
                  ),
                  .groups="drop"
        )
    
    # Return the data frame
    df
    
}

postProcessCountyData <- function(lstCtyBurden,
                                  lstCtyVax,
                                  lstState, 
                                  maxDate=NULL, 
                                  minDateBurden="2020-02-15", 
                                  minDateVax="2021-04-01"
                                  ) {

    # FUNCTION ARGUMENTS:
    # lstCtyBurden: list of processed county-level burden data (or a dfPerCapita file from this list)
    # lstCtyVax: list of processed county-level vaccines data (or a vaxFix file from this list)
    # lstState: list of processed state-level burden data (or a dfPerCapita file from this list)
    # maxDate: maximum date to use for plotting (NULL means latest date in both lstCty and lstState)
    # minDateBurden: earliest date for scoring burden similarity across files
    # minDateVax: earliest date for scoring vaccine similarity across files
    
    # Extract the relevant perCapita and vaxFix data if needed
    if("list" %in% class(lstCtyBurden)) lstCtyBurden <- lstCtyBurden[["dfPerCapita"]]
    if("list" %in% class(lstState)) lstState <- lstState[["dfPerCapita"]]
    if("list" %in% class(lstCtyVax)) lstCtyVax <- lstCtyVax[["vaxFix"]]
    
    # Get maxDate if not provided
    if(is.null(maxDate)) maxDate <- min(max(lstCtyBurden$date, na.rm=TRUE), max(lstState$date, na.rm=TRUE))
    cat("\nParameter maxDate is:", as.character(maxDate), "\n\n")
    
    # Data for score similarity process
    dfCompare <- compareStateSummedCounty(dfState=lstState, 
                                          dfCounty=lstCtyBurden, 
                                          inclStates=c(state.abb, "DC"), 
                                          dateThru=maxDate, 
                                          makePlot=FALSE,
                                          returnData=TRUE
                                          )
    scoreSimilarity(dfCompare, minDate=minDateBurden, maxDate=maxDate, makeFacet=FALSE)

    # Check differences in data sources
    dfAllState <- integrateStateVaccine(lstCtyVax, statePerCap=lstState, treatNAZero=TRUE)
    vaxDiff <- scoreVaxSimilarity(dfAllState, minDate=minDateVax, maxDate=maxDate, returnBaseData=TRUE)

    # Create county-level burden data by quarters
    dfRoll91 <- createBurdenCountyDate(lstCtyBurden, 
                                       maxDate=maxDate, 
                                       rollBy=months(c(0, -3, -6, -9)), 
                                       dateSpan=91
                                       )
    makeBurdenDatePlot(dfRoll91, keyVar="cpm91", timeLabel="3-month", varCeiling=100000, varDivBy=1000) %>% print()
    makeBurdenDatePlot(dfRoll91, keyVar="dpm91", timeLabel="3-month", varCeiling=1500) %>% print()

    # Return the key elements
    list(dfCompare=dfCompare, dfAllState=dfAllState, vaxDiff=vaxDiff, dfRoll91=dfRoll91)
    
}

cty_postdata_20220414 <- postProcessCountyData(lstCtyBurden=cty_newdata_20220414$dfPerCapita, 
                                               lstCtyVax=cty_vaxdata_20220415$vaxFix, 
                                               lstState=readFromRDS("cdc_daily_220416")$dfPerCapita
                                               )
## 
## Parameter maxDate is: 2022-04-11

And stand-alone functions include:

# 1. Plot states with meaningful disconnects on 1+ metrics
compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita,
                         dfCounty=cty_newdata_20220414$dfPerCapita,
                         inclStates=c("GA", "FL", "NE", "OK", "KY", "ME", "VT", "WY", "TN", "RI", "MO", "MA", "KS"),
                         dateThru="2022-04-10",
                         makePlot=TRUE,
                         returnData=FALSE
                         )
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

# 2. Plot differences in vaccines data if needed
stateAgeVaxEvolution(cty_postdata_20220414$dfAllState,
                     keyState=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "WV"),
                     createPlot = TRUE
                     )
## Warning: Removed 1 row(s) containing missing values (geom_path).

# 3. Check vaccine similarity scoring on a different time period
scoreVaxSimilarity(cty_postdata_20220414$dfAllState, minDate="2022-01-01", maxDate="2022-03-31")

# 4. Check for states with missing data (after adjustments, should only be HI which does not report by county)
stateAgeVaxEvolution(cty_postdata_20220414$dfAllState, keyState=c(state.abb, "DC"), createPlot = FALSE) %>%
    filter(!complete.cases(.), date >= "2021-01-01") %>%
    group_by(state, ym=customYYYYMM(date), .groups="drop") %>%
    summarize(n=n()) %>%
    pivot_wider(ym, names_from="state", values_from="n") %>%
    arrange(desc(ym))
## `summarise()` has grouped output by 'state', 'ym'. You can override using the `.groups` argument.
## # A tibble: 16 x 2
## # Groups:   ym [16]
##    ym         HI
##    <chr>   <int>
##  1 2022-04    42
##  2 2022-03    93
##  3 2022-02    84
##  4 2022-01    93
##  5 2021-12    93
##  6 2021-11    90
##  7 2021-10    93
##  8 2021-09    90
##  9 2021-08    93
## 10 2021-07    93
## 11 2021-06    90
## 12 2021-05    93
## 13 2021-04    90
## 14 2021-03    93
## 15 2021-02    84
## 16 2021-01    93
# 5. Additional rolling data as needed
dfRoll_28 <- createBurdenCountyDate(cty_newdata_20220414,
                                    maxDate="2022-04-10",
                                    rollBy=months(c(0, -1, -2, -3)),
                                    dateSpan=28
                                    )
makeBurdenDatePlot(dfRoll_28, keyVar="cpm28", timeLabel="1-month", varCeiling=50000, varDivBy=1000)

makeBurdenDatePlot(dfRoll_28, keyVar="dpm28", timeLabel="1-month", varCeiling=500)

Function compareStateSummedCounty() is updated to accept a processed frame:

pivotStateBurdenData <- function(df, 
                                 inclStates, 
                                 varKeep=c("date", "state", "tot_cases", "new_cases", "tot_deaths", "new_deaths"),
                                 varRename=c("tot_cases"="cases", "tot_deaths"="deaths"),
                                 dateThru=NULL
                                 ) {
    
    # FUNCTION ARGUMENTS:
    # df: a state-level burden data frame
    # inclStates: states to be included
    # varKeep: variables to be kept from state burden frame
    # varRename: variables to be renamed
    # dateThru: maximum date for the analysis (NULL means use max(date) from df)
    
    # Create and return pivoted state-level data
    df %>%
        colSelector(vecSelect=varKeep) %>%
        colRenamer(vecRename=varRename) %>%
        filter(state %in% all_of(inclStates), 
               date <= if(is.null(dateThru)) max(date) else as.Date(dateThru)
               ) %>%
        pivot_longer(-c(date, state)) %>%
        mutate(value=ifelse(is.na(value), 0, value)) %>%
        arrange(date, state, name) %>%
        group_by(state, name) %>%
        mutate(value7=zoo::rollmean(value, k=7, fill=NA, align="center"), src="state") %>%
        ungroup()
    
}

createSummedCountyBurdenData <- function(df, 
                                         inclStates, 
                                         varKeep=c("state", "countyFIPS", "date", 
                                                   "cases", "new_cases", "deaths", "new_deaths"
                                                   ),
                                         varRename=c(),
                                         dateThru=NULL
                                 ) {
    
    # FUNCTION ARGUMENTS:
    # df: a state-level burden data frame
    # inclStates: states to be included
    # varKeep: variables to be kept from state burden frame
    # varRename: variables to be renamed
    # dateThru: maximum date for the analysis (NULL means use max(date) from df)
    
    # Create and return summed and pivoted county-level data
    df %>%
        filter(state %in% all_of(inclStates), 
               date <= if(is.null(dateThru)) max(date) else as.Date(dateThru)
               ) %>%
        colSelector(vecSelect=varKeep) %>%
        colRenamer(vecRename=varRename) %>%
        pivot_longer(-c(state, countyFIPS, date)) %>%
        arrange(state, countyFIPS, date) %>%
        group_by(state, countyFIPS, name) %>%
        mutate(value7=zoo::rollmean(value, k=7, fill=NA, align="center")) %>%
        ungroup() %>%
        filter(state %in% all_of(inclStates)) %>%
        group_by(state, date, name) %>%
        summarize(across(c(value, value7), .fns=sum), .groups="drop") %>%
        filter(!is.na(value7)) %>%
        mutate(src="county")
    
}

compareStateSummedCounty <- function(dfState=NULL, 
                                     dfCounty=NULL,
                                     lstAll=NULL,
                                     aggData=FALSE,
                                     inclStates=if(isTRUE(aggData)) c(state.abb, "DC") else NULL, 
                                     dateThru=NULL, 
                                     makePlot=TRUE, 
                                     createData=TRUE,
                                     returnData=FALSE
                                     ) {
    
    # FUNCTION ARGUMENTS:
    # dfState: processed state-level metrics
    # dfCounty: processed county-level metrics
    # lstAll: list containing processed dfState and dfCounty
    # aggData: boolean, should data be aggregated (FALSE means one plot series per state)
    # inclStates: character vector of states to include
    # dateThru: latest date for analyzsis (NULL means use all data)
    # makePlot: boolean, should plots be created for each state?
    # createData: boolean, does dfAll need to be built from dfState and dfCounty? FALSE means use dfAll
    # returnData: boolean, should a list of processed dfState and processed dfCounty be returned?
    
    # Check that data passed matches parameters
    if(isTRUE(createData)) {
        if(is.null(dfState) | is.null(dfCounty)) stop("\nNeed to pass dfState and dfCounty when createData=TRUE\n")
        if(!is.null(lstAll)) cat("\nlstAll is ignored, to be built from dfState and dfCounty when createData=TRUE\n")
    }
    if(!isTRUE(createData)) {
        if(is.null(lstAll)) stop("\nNeed to pass lstAll when createData=FALSE\n")
        if(!is.null(dfState)) cat("\ndfState ignored; using lstAll due to createData=FALSE\n")
        if(!is.null(dfCounty)) cat("\ndfCounty ignored; using lstAll due to createData=FALSE\n")
    }
    
    # Check that at least one state has been passed
    if(is.null(inclStates) | length(inclStates)==0) {
        cat("\nNo states passed to compareStateSummedCounty, returning without running function\n")
        return()
    }
    
    # Create or extract data as directed by parameter createData
    if(isTRUE(createData)) {
        dfState <- pivotStateBurdenData(dfState, inclStates=inclStates, dateThru=dateThru)
        dfCounty <- createSummedCountyBurdenData(dfCounty, inclState=inclStates, dateThru=dateThru)
    } else {
        dfState <- lstAll[["dfState"]]
        dfCounty <- lstAll[["dfCounty"]]
    }
    
    # If data are to be aggregated, perform the aggregation
    if(isTRUE(aggData)) {
        tempAgg <- function(df) {
            df %>% 
                group_by(date, name, src) %>% 
                summarize(value=specNA(sum)(value, na.rm=TRUE), 
                          value7=specNA(sum)(value7, na.rm=TRUE), 
                          .groups="drop"
                          ) %>% 
                mutate(state="Aggregated")
        }
        dfState <- tempAgg(dfState)
        dfCounty <- tempAgg(dfCounty)
        inclStates <- "Aggregated"
    }
    
    if(isTRUE(makePlot)) {
        
        for(thisState in all_of(inclStates)) {
            
            p1 <- dfCounty %>%
                filter(state %in% all_of(thisState)) %>%
                bind_rows(filter(dfState, state %in% all_of(thisState))) %>%
                ggplot(aes(x=date, y=value7)) + 
                geom_line(aes(group=src, color=src)) + 
                facet_wrap(~name, scales="free_y") + 
                labs(x=NULL, 
                     y="Rolling 7-day mean", 
                     title="Summed county burden by metric (7-day mean)", 
                     subtitle=paste0(thisState, " counties")
                ) + 
                scale_color_discrete("Source")
            print(p1)
            
        }
    }
    
    if(returnData) list(dfState=dfState, dfCounty=dfCounty)
    
}

# Check that results are the same when creating the full frame (default)
chk1 <- compareStateSummedCounty(dfState=readFromRDS("cdc_daily_220416")$dfPerCapita, 
                                 dfCounty=cty_newdata_20220414$dfPerCapita, 
                                 inclStates=c(state.abb, "DC"), 
                                 dateThru="2022-04-11", 
                                 makePlot=FALSE,
                                 returnData=TRUE
                                 )
identical(chk1, cty_postdata_20220414$dfCompare)
## [1] TRUE
# Check that results are the same when using the existing frame
chk2 <- compareStateSummedCounty(lstAll=cty_postdata_20220414$dfCompare,
                                 inclStates=c(state.abb, "DC"), 
                                 dateThru="2022-04-11", 
                                 createData=FALSE,
                                 makePlot=FALSE,
                                 returnData=TRUE
                                 )
identical(chk1, chk2)
## [1] TRUE
# Example plotting for select states
compareStateSummedCounty(lstAll=cty_postdata_20220414$dfCompare,
                         inclStates=c("GA", "FL", "NE"),
                         dateThru="2022-04-11",
                         createData=FALSE,
                         makePlot=TRUE,
                         returnData=FALSE
                         )
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

An overall function for additional post-processing is written:

additionalCountyPostProcess <- function(lstPost, 
                                        p1CompareStates=c(), 
                                        p1AggData=FALSE, 
                                        p2VaxStates=c(), 
                                        p3VaxTimes=c(), 
                                        p4DF=NULL,
                                        p4MaxDate=NULL, 
                                        p4RollBy=months(c(0, -1, -2, -3)),
                                        p4DateSpan=28, 
                                        p4CPMCeiling=50000, 
                                        p4DPMCeiling=500
                                        ) {
    
    # FUNCTION ARGUMENTS:
    # lstPost: list of post-processed county data
    # p1CompareStates: states that should be compared vs. summed county
    # p1AggData: boolean, should the comparison states all be aggregated to a single comparison?
    # p2VaxStates: states that should be compared for vaccine evolution
    # p3VaxTimes: character vector of form c(minDate, maxDate) for time period to score vaccine similarity
    # p4DF: data frame for creating rolling data (NULL means do not run)
    # p4MaxDate: maximum date for rolling analysis (NULL means use maximum date in p4DF minus 1 day)
    # p4RollBy: time periods to roll back for analysis
    # p4DateSpan: size of windows for rolling analysis
    # p4CPMCeiling: ceiling for plots on CPM (all values at or above this will be the same color)
    # p4DPMCeiling: ceiling for plots on DPM (all values at or above this will be the same color)
    
    # 1. Plotting state vs. summed county for key states
    if(length(p1CompareStates) > 0) {
        compareStateSummedCounty(lstAll=lstPost[["dfCompare"]], 
                                 inclStates=p1CompareStates, 
                                 createData=FALSE, 
                                 aggData=p1AggData
                                 )
    }

    # 2. Plot differences in vaccines data if needed
    if(length(p2VaxStates) > 0) 
        stateAgeVaxEvolution(lstPost[["dfAllState"]], keyState=p2VaxStates)

    # 3. Check vaccine similarity scoring on a different time period
    if(length(p3VaxTimes) > 0) {
        if(length(p3VaxTimes) != 2 | p3VaxTimes[2] < p3VaxTimes[1]) 
            cat("\np3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter\n")
        else 
            scoreVaxSimilarity(lstPost[["dfAllState"]], minDate=p3VaxTimes[1], maxDate=p3VaxTimes[2])
    }

    # 4. Additional rolling data as needed
    if(!is.null(p4DF)) {
        if(is.null(p4MaxDate)) p4MaxDate <- max(p4DF$date) - lubridate::days(1)
        dfRoll <- createBurdenCountyDate(p4DF, maxDate=p4MaxDate, rollBy=p4RollBy, dateSpan=p4DateSpan)
        makeBurdenDatePlot(dfRoll, 
                           keyVar=paste0("cpm", p4DateSpan), 
                           timeLabel=paste0(p4DateSpan, "-day"), 
                           varCeiling=p4CPMCeiling, 
                           varDivBy=1000
                           ) %>%
            print()
        makeBurdenDatePlot(dfRoll, 
                           keyVar=paste0("dpm", p4DateSpan), 
                           timeLabel=paste0(p4DateSpan, "-day"), 
                           varCeiling=p4DPMCeiling
                           ) %>%
            print()
    }
    
}

# Nothing will run
additionalCountyPostProcess(cty_postdata_20220414)

# Just burden comparisons
additionalCountyPostProcess(cty_postdata_20220414, p1CompareStates=c("GA", "FL", "NE"))
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

additionalCountyPostProcess(cty_postdata_20220414, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Removed 6 row(s) containing missing values (geom_path).

# Just vaccine comparisons
additionalCountyPostProcess(cty_postdata_20220414, p2VaxStates=c("MA", "HI", "TX"))
## Warning: Removed 1465 row(s) containing missing values (geom_path).

# Just scoring updates (and errors)
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=c("2022-01-01"))
## 
## p3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=c("2022-04-10", "2022-01-01"))
## 
## p3VaxTimes should be c(minDate, maxDate), skipping this step due to bad parameter
additionalCountyPostProcess(cty_postdata_20220414, p3VaxTimes=sort(c("2022-04-10", "2022-01-01")))

# Just new rolling data
additionalCountyPostProcess(p4DF=cty_newdata_20220414$dfPerCapita)

additionalCountyPostProcess(p4DF=cty_newdata_20220414$dfPerCapita, 
                            p4RollBy=months(c(0, -3, -6, -9)), 
                            p4DateSpan=91, 
                            p4CPMCeiling=100000, 
                            p4DPMCeiling=1000
                            )

The latest county-level burden data are downloaded:

urlMapper[["usafCase"]] <- "https://static.usafacts.org/public/data/covid-19/covid_confirmed_usafacts.csv"
urlMapper[["usafDeath"]] <- "https://static.usafacts.org/public/data/covid-19/covid_deaths_usafacts.csv"
urlMapper[["usafPop"]] <- "https://static.usafacts.org/public/data/covid-19/covid_county_population_usafacts.csv"

readList <- list("usafCase"="./RInputFiles/Coronavirus/covid_confirmed_usafacts_downloaded_20220507.csv", 
                 "usafDeath"="./RInputFiles/Coronavirus/covid_deaths_usafacts_downloaded_20220507.csv"
                 )
compareList <- list("usafCase"=readFromRDS("cty_newdata_20220414")$dfRaw$usafCase, 
                    "usafDeath"=readFromRDS("cty_newdata_20220414")$dfRaw$usafDeath
                    )

# Use existing clusters
cty_newdata_20220507 <- readRunUSAFacts(maxDate="2022-05-05", 
                                        downloadTo=lapply(readList, 
                                                          FUN=function(x) if(file.exists(x)) NA else x
                                                          ),
                                        readFrom=readList, 
                                        compareFile=compareList, 
                                        writeLog="./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log", 
                                        ovrwriteLog=TRUE,
                                        useClusters=readFromRDS("cty_newdata_20210813")$useClusters,
                                        skipAssessmentPlots=FALSE,
                                        brewPalette="Paired"
                                        )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 24
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 5 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 95 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   `County Name` = col_character(),
##   State = col_character(),
##   StateFIPS = col_character()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: countyFIPS countyName state stateFIPS 
## 
## 
## *** File has been checked for uniqueness by: countyFIPS stateFIPS date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 24
## Detailed differences available in: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## Checking for similarity of: county
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 4 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 88 records
## Detailed output available in log: ./RInputFiles/Coronavirus/USAF_NewData_20220414_chk_v005.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType    cases new_cases            n
##   <chr>     <dbl>     <dbl>        <dbl>
## 1 before 2.40e+10   7.96e+7 2666155     
## 2 after  2.39e+10   7.91e+7 2623570     
## 3 pctchg 5.37e- 3   6.20e-3       0.0160
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 4
##   isType  deaths  new_deaths            n
##   <chr>    <dbl>       <dbl>        <dbl>
## 1 before 3.87e+8 990515      2666155     
## 2 after  3.77e+8 940572      2623570     
## 3 pctchg 2.55e-2      0.0504       0.0160

## NULL

# Plot all counties based on closest cluster
sparseCountyClusterMap(cty_newdata_20220507$useClusters, 
                       caption="Includes only counties with 25k+ population",
                       brewPalette="viridis"
                       )

# Save the refreshed file
saveToRDS(cty_newdata_20220507, ovrWriteError=FALSE)

Vaccines data are also updated:

cty_vaxdata_20220508 <- processCountyVaccines(loc="./RInputFiles/Coronavirus/county_vaccine_20220508.csv", 
                                              ctyList=cty_newdata_20220507, 
                                              minDateCD=c("2022-02-01", "2022-02-01"),
                                              maxDateCD="2022-04-30"
                                              )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   Date = col_character(),
##   FIPS = col_character(),
##   Recip_County = col_character(),
##   Recip_State = col_character(),
##   SVI_CTGY = col_character(),
##   Metro_status = col_character()
## )
## i Use `spec()` for the full column specifications.
## 
## Records from other than 50 states and DC:
## # A tibble: 10 x 2
##    state     n
##    <chr> <int>
##  1 AS      502
##  2 FM      507
##  3 GU     1017
##  4 MH      494
##  5 MP      504
##  6 PR    40365
##  7 PW      501
##  8 UNK     308
##  9 VI     2034
## 10 <NA>     34

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## Warning: Removed 16 rows containing non-finite values (stat_boxplot).

## 
## Count of NA records by column
##           state            FIPS popgte65_minpop popgte65_maxpop    popgte65_nnA 
##               0               0               0               0               0 
##               n 
##               0 
## 
## Records where minimum and maximum population differ# A tibble: 0 x 5
## # ... with 5 variables: state <chr>, FIPS <chr>, age <chr>, minpop <dbl>,
## #   maxpop <dbl>
## 
## 
## 
## Will run with parameters:
## burdenVar: cpm dpm 
## vaxVar: vxcpoppct vxcpoppct 
## minDateCD: 2022-02-01 2022-02-01 
## maxDateCD: 2022-04-30 2022-04-30
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_point).

## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -16913246   -865056   -142320   1156343  29100686 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18410.802    845.223  21.782   <2e-16 ***
## vaxMetric       5.315     13.655   0.389    0.697    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2841000 on 3132 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  4.837e-05,  Adjusted R-squared:  -0.0002709 
## F-statistic: 0.1515 on 1 and 3132 DF,  p-value: 0.6971
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##       Min        1Q    Median        3Q       Max 
## -15433621  -1306806   -474319    813529  28327596 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                15900.16    2746.17   5.790 7.74e-09 ***
## type>500k                8211.07    1738.47   4.723 2.42e-06 ***
## type100k-500k            8147.59    1751.84   4.651 3.44e-06 ***
## type25k-100k            15344.40    1967.89   7.797 8.54e-15 ***
## vaxMetric:type<25k        136.60      57.61   2.371   0.0178 *  
## vaxMetric:type>500k       135.40      25.81   5.246 1.66e-07 ***
## vaxMetric:type100k-500k   186.48      29.33   6.358 2.34e-10 ***
## vaxMetric:type25k-100k    123.48      38.62   3.197   0.0014 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2753000 on 3126 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.8307, Adjusted R-squared:  0.8302 
## F-statistic:  1917 on 8 and 3126 DF,  p-value: < 2.2e-16
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 8 rows containing non-finite values (stat_smooth).

## Warning: Removed 8 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric, data = dfReg, weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -524335  -23570    1256   31450  513584 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 856.3315    17.5443   48.81   <2e-16 ***
## vaxMetric    -9.4468     0.2834  -33.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58980 on 3132 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.2618, Adjusted R-squared:  0.2616 
## F-statistic:  1111 on 1 and 3132 DF,  p-value: < 2.2e-16
## 
## 
## Call:
## lm(formula = get(burdenVar) ~ vaxMetric * type + 0 - vaxMetric, 
##     data = dfReg, weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -520669  -28168   -3747   27767  539544 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## type<25k                738.7731    58.3126  12.669  < 2e-16 ***
## type>500k               803.2666    36.9150  21.760  < 2e-16 ***
## type100k-500k           781.5071    37.1988  21.009  < 2e-16 ***
## type25k-100k            754.0531    41.7865  18.045  < 2e-16 ***
## vaxMetric:type<25k       -5.8518     1.2233  -4.783 1.80e-06 ***
## vaxMetric:type>500k      -8.8002     0.5481 -16.057  < 2e-16 ***
## vaxMetric:type100k-500k  -8.3533     0.6228 -13.413  < 2e-16 ***
## vaxMetric:type25k-100k   -6.6905     0.8201  -8.159 4.86e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58460 on 3126 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.7375, Adjusted R-squared:  0.7368 
## F-statistic:  1098 on 8 and 3126 DF,  p-value: < 2.2e-16
# Save the refreshed file
saveToRDS(cty_vaxdata_20220508, ovrWriteError=FALSE)

County-level data are post-processed:

cty_postdata_20220507 <- postProcessCountyData(lstCtyBurden=cty_newdata_20220507$dfPerCapita, 
                                               lstCtyVax=cty_vaxdata_20220508$vaxFix, 
                                               lstState=readFromRDS("cdc_daily_220501")$dfPerCapita
                                               )
## 
## Parameter maxDate is: 2022-04-30

# Save the refreshed file
saveToRDS(cty_postdata_20220507, ovrWriteError=FALSE)

Additional post-processing steps are run:

# Step 1a: Burden comparisons for aggregated states
additionalCountyPostProcess(cty_postdata_20220507, p1CompareStates=c(state.abb, "DC"), p1AggData=TRUE)
## Warning: Removed 6 row(s) containing missing values (geom_path).

# Step 1: Burden aggregation for key states
# Step 2: vaccine comparisons
# Step 3: Scoring updates (and errors)
# Step 4: New rolling data (28-day default with ceilings 50000 CPM, 500 DPM)
additionalCountyPostProcess(cty_postdata_20220507, 
                            p1CompareStates=c("GA", "FL", "NE"), 
                            p2VaxStates=c("MA", "HI", "TX", "VA", "VT", "GA", "CO", "SD"), 
                            p3VaxTimes=sort(c("2021-05-01", "2022-04-30")),
                            p4DF=cty_newdata_20220507$dfPerCapita
                            )
## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 6 row(s) containing missing values (geom_path).

## Warning: Removed 8 row(s) containing missing values (geom_path).

Evolution of burden and vaccines is compared for select counties:

# Sample list of counties
ctyList <- c("01089", "29510", "34027")

# Extract name and population data
tmpNamePop <- cty_newdata_20220507$countyData %>%
    filter(countyFIPS %in% ctyList) %>%
    mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))

# Extract burden per-capita data
tmpBurden <- cty_newdata_20220507$dfPerCapita %>%
    filter(countyFIPS %in% ctyList)

# Extract vaccines per-capita data
tmpVax <- cty_vaxdata_20220508$vaxFix %>%
    filter(FIPS %in% ctyList)

# Create integrated database of burden and vaccines
tmpCombine <- tmpBurden %>%
    select(countyFIPS, date, cpm7, dpm7, tcpm7, tdpm7) %>%
    full_join(tmpVax %>% select(countyFIPS=FIPS, date, vxcpoppct, vxcgte18pct, vxcgte65pct), 
              by=c("countyFIPS", "date")
              ) %>%
    pivot_longer(-c(countyFIPS, date))

# Create plot from data
tmpNamePopVec <- tmpNamePop$fullName %>% purrr::set_names(tmpNamePop$countyFIPS)
tmpVars <- c("dpm7", "tdpm7", "cpm7", "tcpm7", "vxcpoppct", "vxcgte65pct")
tmpVarNames <- c("1. Death per million", "2. Death per million (cum)", 
                 "3. Case per million", "4. Case per million (cum)", 
                 "5. % Vaccinated (all)", "6. % Vaccinated (65+)"
                 ) %>%
    purrr::set_names(tmpVars)

tmpCombine %>%
    filter(name %in% all_of(tmpVars), !is.na(value)) %>%
    ggplot(aes(x=date, y=value)) + 
    geom_line(aes(color=tmpNamePopVec[countyFIPS], group=countyFIPS)) + 
    facet_wrap(~tmpVarNames[name], scales="free_y", ncol=2) + 
    labs(x=NULL, y=NULL, title="Evolution of metrics by select county") +
    scale_color_discrete("")

A function is written for the extracts:

compareBurdenVaxCounty <- function(lstBurden, 
                                   lstVax, 
                                   ctyIDs, 
                                   burdenVars=c("dpm7", "tdpm7", "cpm7", "tcpm7"), 
                                   vaxVars=c("vxcpoppct", "vxcgte65pct"), 
                                   plotVars=c(burdenVars, vaxVars),
                                   plotVarNames=c("1. Death per million", "2. Death per million (cum)", 
                                                  "3. Case per million", "4. Case per million (cum)", 
                                                  "5. % Vaccinated (all)", "6. % Vaccinated (65+)"
                                                  ), 
                                   printPlot=TRUE, 
                                   returnData=!isTRUE(printPlot)
                                   ) {

    # FUNCTION ARGUMENTS:
    # lstBurden: processed file containing county-level burden data
    # lstVax: processed file containing county-level vaccines data
    # ctyIDs: vector of county-FIPS to process OR named list of county-FIPS to amalgamate
    #         if vector, each plotted separately with county name as legend and color
    #         if named list, each element amalgamated with list name as legend and color
    # burdenVars: variables to plot from burden data
    # vaxVars: variables to plot from vaccines data
    # plotVars: variables to be plotted
    # plotVarNames: names to be associated to plotVars
    # printPlot: boolean, should plot be created and printed?
    # returnData: boolean, should data frame be returned?
    
    # Create valid list of countyFIPS for processing
    # If passed as list, convert to vector; convert all to 5-string characters
    if("list" %in% class(ctyIDs)) ctyUse <- zeroPad5(unname(unlist(ctyIDs)))
    else ctyUse <- zeroPad5(ctyIDs)
    
    # Extract name and population data
    dfNamePop <- lstBurden[["countyData"]] %>%
        filter(countyFIPS %in% ctyUse) %>%
        mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))

    # Extract burden per-capita data
    dfBurden <- lstBurden[["dfPerCapita"]] %>%
        filter(countyFIPS %in% ctyUse)

    # Extract vaccines per-capita data
    dfVax <- lstVax[["vaxFix"]] %>%
        filter(FIPS %in% ctyUse)
    
    # Create integrated database of burden and vaccines
    dfCombine <- dfBurden %>%
        select(countyFIPS, date, all_of(burdenVars)) %>%
        full_join(dfVax %>% select(countyFIPS=FIPS, date, all_of(vaxVars)), 
                  by=c("countyFIPS", "date")
                  ) %>%
        pivot_longer(-c(countyFIPS, date))
    
    # If data to be amalgamated, run here
    if("list" %in% class(ctyIDs)) {
        # Create mapping frame
        mapFrame <- map_dfr(names(ctyIDs), .f=function(x) tibble::tibble(mapName=x, countyFIPS=ctyIDs[[x]]))
        # Add population data and mapped name
        dfCombine <- dfCombine %>%
            filter(!is.na(value)) %>%
            inner_join(select(dfNamePop, countyFIPS, pop), by="countyFIPS") %>%
            inner_join(mapFrame, by="countyFIPS") %>%
            group_by(mapName, date, name) %>%
            summarize(value=sum(value*pop)/sum(pop), pop=sum(pop), .groups="drop")
    }
    
    # Create vectors for mapping countyFIPS and metric abbreviations to descriptive names
    varNameVec <- plotVarNames %>% purrr::set_names(plotVars)
    
    # Create vectors for mapping countyFIPS to descriptive names (only relevant if not amalgamated)
    namePopVec <- dfNamePop$fullName %>% purrr::set_names(dfNamePop$countyFIPS)
    
    # Create and display plot (if requested)
    if(isTRUE(printPlot)) {
        p1 <- dfCombine %>%
            filter(name %in% all_of(plotVars), !is.na(value)) %>%
            ggplot(aes(x=date, y=value)) + 
            facet_wrap(~varNameVec[name], scales="free_y", ncol=2) + 
            labs(x=NULL, y=NULL, title="Evolution of metrics by select county") +
            scale_color_discrete("")
        # Add lines colored appropriately
        if("list" %in% class(ctyIDs)) p1 <- p1 + geom_line(aes(color=mapName, group=mapName))
        else p1 <- p1 + geom_line(aes(color=namePopVec[countyFIPS], group=countyFIPS))
        print(p1)
    }
    
    # Return data if requested
    if(isTRUE(returnData)) return(dfCombine)
    
}

# Run with defaults
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=ctyList)

# Run for data only
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=ctyList, printPlot=FALSE)
## # A tibble: 15,066 x 4
##    countyFIPS date       name        value
##    <chr>      <date>     <chr>       <dbl>
##  1 01089      2020-01-22 dpm7           NA
##  2 01089      2020-01-22 tdpm7          NA
##  3 01089      2020-01-22 cpm7           NA
##  4 01089      2020-01-22 tcpm7          NA
##  5 01089      2020-01-22 vxcpoppct      NA
##  6 01089      2020-01-22 vxcgte65pct    NA
##  7 29510      2020-01-22 dpm7           NA
##  8 29510      2020-01-22 tdpm7          NA
##  9 29510      2020-01-22 cpm7           NA
## 10 29510      2020-01-22 tcpm7          NA
## # ... with 15,056 more rows

The function has been updated so that it can take an amalgamation of counties:

# Single amalgamation
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=list("all"=ctyList))

# Select states
keyStates <- c("NY", "LA", "CA", "TX")
tmpStates <- cty_newdata_20220507$countyData %>% 
    filter(pop >= 100000, pop <= 500000, state %in% all_of(keyStates))
tmpList <- lapply(keyStates, FUN=function(x) tmpStates %>% filter(state==x) %>% pull(countyFIPS)) %>%
    purrr::set_names(paste0(keyStates, " counties 100k-500k"))
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)

## # A tibble: 17,352 x 5
##    mapName               date       name   value     pop
##    <chr>                 <date>     <chr>  <dbl>   <dbl>
##  1 CA counties 100k-500k 2020-01-25 cpm7   2.61  5357286
##  2 CA counties 100k-500k 2020-01-25 dpm7   0     5357286
##  3 CA counties 100k-500k 2020-01-25 tcpm7 17.5   5357286
##  4 CA counties 100k-500k 2020-01-25 tdpm7  0     5357286
##  5 CA counties 100k-500k 2020-01-26 cpm7   0.293 5357286
##  6 CA counties 100k-500k 2020-01-26 dpm7   0     5357286
##  7 CA counties 100k-500k 2020-01-26 tcpm7 17.8   5357286
##  8 CA counties 100k-500k 2020-01-26 tdpm7  0     5357286
##  9 CA counties 100k-500k 2020-01-27 cpm7   0.213 5357286
## 10 CA counties 100k-500k 2020-01-27 dpm7   0     5357286
## # ... with 17,342 more rows

The function is run on subsets of counties based on most recent vaccination data:

tmpSegVax <- cty_vaxdata_20220508$vaxFix %>% 
    filter(date==max(date), pop >= 50000, pop <= 500000, !is.na(vxcgte18pct)) 
tmpSegVax %>%
    ggplot(aes(x=vxcgte18pct)) + 
    geom_histogram(fill="lightblue", bins=100) + 
    lims(x=c(0, 100)) + 
    labs(x="% Vaccinated (18+)", y=NULL, title="# counties by % vaccinated (18+)", subtitle="Population 50k-500k")
## Warning: Removed 2 rows containing missing values (geom_bar).

tmpSegVax <- tmpSegVax %>%
    mutate(bucket=case_when(vxcgte18pct<55 ~ "0-54", 
                            vxcgte18pct<=70 ~ "55-70", 
                            vxcgte18pct<=100 ~ "71-100", 
                            TRUE ~ "err"
                            )
           )
tmpList <- lapply(c("0-54", "55-70", "71-100"), FUN=function(x) filter(tmpSegVax, bucket==x) %>% pull(FIPS)) %>%
    purrr::set_names("3. Under 55%", "2. 55%-70%", "1. Over 70%")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)

## # A tibble: 13,014 x 5
##    mapName     date       name    value      pop
##    <chr>       <date>     <chr>   <dbl>    <dbl>
##  1 1. Over 70% 2020-01-25 cpm7  0.187   55026673
##  2 1. Over 70% 2020-01-25 dpm7  0.00260 55026673
##  3 1. Over 70% 2020-01-25 tcpm7 1.23    55026673
##  4 1. Over 70% 2020-01-25 tdpm7 0.0182  55026673
##  5 1. Over 70% 2020-01-26 cpm7  0.0234  55026673
##  6 1. Over 70% 2020-01-26 dpm7  0       55026673
##  7 1. Over 70% 2020-01-26 tcpm7 1.26    55026673
##  8 1. Over 70% 2020-01-26 tdpm7 0.0182  55026673
##  9 1. Over 70% 2020-01-27 cpm7  0.0234  55026673
## 10 1. Over 70% 2020-01-27 dpm7  0       55026673
## # ... with 13,004 more rows

The function is run on subsets of counties based on most recent change in deaths per capita:

tmpSegDeath <- cty_newdata_20220507$dfPerCapita %>% 
    inner_join(select(cty_newdata_20220507$countyData, countyFIPS, pop), by="countyFIPS") %>%
    filter(date %in% c(max(date), max(date)-lubridate::days(360)), 
           pop >= 50000, pop <= 500000, 
           !is.na(tdpm)
           ) %>%
    group_by(countyFIPS) %>%
    mutate(deltaDeath=max(tdpm)-min(tdpm)) %>%
    ungroup()
tmpSegDeath %>%
    filter(date==max(date)) %>%
    ggplot(aes(x=deltaDeath)) + 
    geom_histogram(fill="lightblue", bins=100) + 
    labs(x="Death per million in most recent year", 
         y=NULL, 
         title="# counties by DPM in most recent year", 
         subtitle="Population 50k-500k"
         )

tmpSegDeath <- tmpSegDeath %>%
    filter(date==max(date)) %>%
    mutate(bucket=case_when(deltaDeath<1000 ~ "0-999", 
                            deltaDeath<=2000 ~ "1000-2000", 
                            deltaDeath<=4000 ~ "2001+", 
                            TRUE ~ "err"
                            )
           )
tmpList <- lapply(c("0-999", "1000-2000", "2001+"), 
                  FUN=function(x) filter(tmpSegDeath, bucket==x) %>% pull(countyFIPS)
                  ) %>%
    purrr::set_names("3. Under 1000", "2. 1000-2000", "1. Over 2000")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)

## # A tibble: 13,014 x 5
##    mapName      date       name  value      pop
##    <chr>        <date>     <chr> <dbl>    <dbl>
##  1 1. Over 2000 2020-01-25 cpm7      0 16960028
##  2 1. Over 2000 2020-01-25 dpm7      0 16960028
##  3 1. Over 2000 2020-01-25 tcpm7     0 16960028
##  4 1. Over 2000 2020-01-25 tdpm7     0 16960028
##  5 1. Over 2000 2020-01-26 cpm7      0 16960028
##  6 1. Over 2000 2020-01-26 dpm7      0 16960028
##  7 1. Over 2000 2020-01-26 tcpm7     0 16960028
##  8 1. Over 2000 2020-01-26 tdpm7     0 16960028
##  9 1. Over 2000 2020-01-27 cpm7      0 16960028
## 10 1. Over 2000 2020-01-27 dpm7      0 16960028
## # ... with 13,004 more rows

The function is run on subsets of counties based on most recent change in cases per capita:

tmpSegCase <- cty_newdata_20220507$dfPerCapita %>% 
    inner_join(select(cty_newdata_20220507$countyData, countyFIPS, pop), by="countyFIPS") %>%
    filter(date %in% c(max(date), max(date)-lubridate::days(360)), 
           pop >= 50000, pop <= 500000, 
           !is.na(tcpm)
           ) %>%
    group_by(countyFIPS) %>%
    mutate(deltaCase=max(tcpm)-min(tcpm)) %>%
    ungroup()
tmpSegCase %>%
    filter(date==max(date)) %>%
    ggplot(aes(x=deltaCase)) + 
    geom_histogram(fill="lightblue", bins=100) + 
    labs(x="Cases per million in most recent year", 
         y=NULL, 
         title="# counties by CPM in most recent year", 
         subtitle="Population 50k-500k"
         )

tmpSegCase <- tmpSegCase %>%
    filter(date==max(date)) %>%
    mutate(bucket=case_when(deltaCase<125000 ~ "0-125k", 
                            deltaCase<=175000 ~ "125k-175k", 
                            deltaCase<=300000 ~ "175k+", 
                            TRUE ~ "err"
                            )
           )
tmpList <- lapply(c("0-125k", "125k-175k", "175k+"), 
                  FUN=function(x) filter(tmpSegCase, bucket==x) %>% pull(countyFIPS)
                  ) %>%
    purrr::set_names("3. Under 125k", "2. 125k-175k", "1. Over 175k")
compareBurdenVaxCounty(lstBurden=cty_newdata_20220507, lstVax=cty_vaxdata_20220508, ctyIDs=tmpList, returnData=TRUE)

## # A tibble: 13,014 x 5
##    mapName      date       name     value      pop
##    <chr>        <date>     <chr>    <dbl>    <dbl>
##  1 1. Over 175k 2020-01-25 cpm7  1.13e- 1 17660748
##  2 1. Over 175k 2020-01-25 dpm7  8.09e- 3 17660748
##  3 1. Over 175k 2020-01-25 tcpm7 7.85e- 1 17660748
##  4 1. Over 175k 2020-01-25 tdpm7 5.66e- 2 17660748
##  5 1. Over 175k 2020-01-26 cpm7  8.09e- 3 17660748
##  6 1. Over 175k 2020-01-26 dpm7  0        17660748
##  7 1. Over 175k 2020-01-26 tcpm7 7.93e- 1 17660748
##  8 1. Over 175k 2020-01-26 tdpm7 5.66e- 2 17660748
##  9 1. Over 175k 2020-01-27 cpm7  6.59e-18 17660748
## 10 1. Over 175k 2020-01-27 dpm7  0        17660748
## # ... with 13,004 more rows

The process is converted to functional form, allowing for custom labels:

compareBurdenVaxCounty <- function(lstBurden, 
                                   lstVax, 
                                   ctyIDs, 
                                   burdenVars=c("dpm7", "tdpm7", "cpm7", "tcpm7"), 
                                   vaxVars=c("vxcpoppct", "vxcgte65pct"), 
                                   plotVars=c(burdenVars, vaxVars),
                                   plotVarNames=c("1. Death per million", "2. Death per million (cum)", 
                                                  "3. Case per million", "4. Case per million (cum)", 
                                                  "5. % Vaccinated (all)", "6. % Vaccinated (65+)"
                                                  ), 
                                   printPlot=TRUE, 
                                   p1Title="Evolution of metrics by select county",
                                   p1SubTitle=ggplot2::waiver(),
                                   p1ScaleLabel="",
                                   returnData=!isTRUE(printPlot)
                                   ) {

    # FUNCTION ARGUMENTS:
    # lstBurden: processed file containing county-level burden data
    # lstVax: processed file containing county-level vaccines data
    # ctyIDs: vector of county-FIPS to process OR named list of county-FIPS to amalgamate
    #         if vector, each plotted separately with county name as legend and color
    #         if named list, each element amalgamated with list name as legend and color
    # burdenVars: variables to plot from burden data
    # vaxVars: variables to plot from vaccines data
    # plotVars: variables to be plotted
    # plotVarNames: names to be associated to plotVars
    # printPlot: boolean, should plot be created and printed?
    # p1Title: title to be used in plot
    # p1SubTitle: subtitle to be used in plot
    # p1ScaleLabel: label to be used for the color scale in plot
    # returnData: boolean, should data frame be returned?
    
    # Create valid list of countyFIPS for processing
    # If passed as list, convert to vector; convert all to 5-string characters
    if("list" %in% class(ctyIDs)) ctyUse <- zeroPad5(unname(unlist(ctyIDs)))
    else ctyUse <- zeroPad5(ctyIDs)
    
    # Extract name and population data
    dfNamePop <- lstBurden[["countyData"]] %>%
        filter(countyFIPS %in% ctyUse) %>%
        mutate(fullName=paste0(countyName, ", ", state, " (pop: ", round(pop/1000), "k)"))

    # Extract burden per-capita data
    dfBurden <- lstBurden[["dfPerCapita"]] %>%
        filter(countyFIPS %in% ctyUse)

    # Extract vaccines per-capita data
    dfVax <- lstVax[["vaxFix"]] %>%
        filter(FIPS %in% ctyUse)
    
    # Create integrated database of burden and vaccines
    dfCombine <- dfBurden %>%
        select(countyFIPS, date, all_of(burdenVars)) %>%
        full_join(dfVax %>% select(countyFIPS=FIPS, date, all_of(vaxVars)), 
                  by=c("countyFIPS", "date")
                  ) %>%
        pivot_longer(-c(countyFIPS, date))
    
    # If data to be amalgamated, run here
    if("list" %in% class(ctyIDs)) {
        # Create mapping frame
        mapFrame <- map_dfr(names(ctyIDs), .f=function(x) tibble::tibble(mapName=x, countyFIPS=ctyIDs[[x]]))
        # Add population data and mapped name
        dfCombine <- dfCombine %>%
            filter(!is.na(value)) %>%
            inner_join(select(dfNamePop, countyFIPS, pop), by="countyFIPS") %>%
            inner_join(mapFrame, by="countyFIPS") %>%
            group_by(mapName, date, name) %>%
            summarize(value=sum(value*pop)/sum(pop), pop=sum(pop), n=n(), .groups="drop") %>%
            group_by(mapName) %>%
            mutate(mapName2=paste0(mapName, " (n=", max(n), ")")) %>%
            ungroup()
    }
    
    # Create vectors for mapping countyFIPS and metric abbreviations to descriptive names
    varNameVec <- plotVarNames %>% purrr::set_names(plotVars)
    
    # Create vectors for mapping countyFIPS to descriptive names (only relevant if not amalgamated)
    namePopVec <- dfNamePop$fullName %>% purrr::set_names(dfNamePop$countyFIPS)
    
    # Create and display plot (if requested)
    if(isTRUE(printPlot)) {
        p1 <- dfCombine %>%
            filter(name %in% all_of(plotVars), !is.na(value)) %>%
            ggplot(aes(x=date, y=value)) + 
            facet_wrap(~varNameVec[name], scales="free_y", ncol=2) + 
            labs(x=NULL, y=NULL, title=p1Title, subtitle=p1SubTitle) +
            scale_color_discrete(p1ScaleLabel)
        # Add lines colored appropriately
        if("list" %in% class(ctyIDs)) {
            if(is.null(p1ScaleLabel) | p1ScaleLabel=="") p1 <- p1 + geom_line(aes(color=mapName, group=mapName))
            else p1 <- p1 + geom_line(aes(color=mapName2, group=mapName2))
        }
        else p1 <- p1 + geom_line(aes(color=namePopVec[countyFIPS], group=countyFIPS))
        print(p1)
    }
    
    # Return data if requested
    if(isTRUE(returnData)) return(dfCombine)
    
}

plotCountyEvolution <- function(lst,
                                lstVax,
                                cutsLabels,
                                baseVar,
                                lstFilter=list(),
                                lstExclude=list(),
                                segVar=baseVar,
                                popRange=c(0, +Inf), 
                                dateRange=c(lubridate::days(0)), 
                                p1Title="Evolution of metrics by select county",
                                p1SubTitle=ggplot2::waiver(),
                                p1ScaleLabel=""
                                ) {
    
    # FUNCTION ARGUMNETS:
    # lst: processed list file containing per-capita and county population data
    # lstVax: processed list file containing vaccines data
    # cutsLabels: named character vector of form c("label"="values-up-tp")
    # baseVar: base variable to be used for segmenting (needs to be not-NA)
    # lstFilter: named list of items to include when setting df (e.g., list("state"=state.abb[1:10]))
    # lstExclude: named list of items to exclude when setting df (e.g., list("state"=state.abb[41:50]))
    # segVar: variable used in segmentation (may be a transform of baseVar, allow for different name)
    # popRange: character vector of length 2 for minimum and maximum population to include
    # dateRange: dates to be considered for segmenting (all subtracted from max(date))
    #            if length 1, that date is used for segmenting on the key metric
    #            if length 2, change in variables between those dates is used on the key metric
    # p1Title: title to be used in plot
    # p1SubTitle: subtitle to be used in plot
    # p1ScaleLabel: label to be used for the color scale in plot
    
    # Check that dateRange is length-1 or length-2
    if(!(length(dateRange) %in% c(1, 2))) error("\nMust pass a dateRange vector of length 1 or length 2\n")
    
    # Create database including population, filter by population and relevant dates, remove NA issues
    df <- lst[["dfPerCapita"]] %>% 
        inner_join(select(lst[["countyData"]], countyFIPS, pop), by="countyFIPS") %>%
        inner_join(select(lstVax[["vaxFix"]], date, countyFIPS=FIPS, vxcpoppct, vxcgte18pct, vxcgte65pct), 
                  by=c("date", "countyFIPS")
                  ) %>%
        filter(pop >= popRange[1], 
               pop <= popRange[2], 
               date %in% (max(date)-dateRange), 
               !is.na(get(baseVar))
               ) %>%
        rowFilter(lstFilter=lstFilter, lstExclude=lstExclude)
 
    # If daterange is of length-2, calculate change in variable
    if(length(dateRange)==2) {
        if(segVar==baseVar) segVar <- "tempNew"
        df <- df %>%
            group_by(countyFIPS) %>%
            mutate(tempNew=sum(get(baseVar)*(date==max(date)))-sum(get(baseVar)*(date!=max(date)))) %>%
            ungroup() %>%
            colRenamer(vecRename=c("tempNew"=segVar))
    }

    # Histogram for key variables
    p1 <- df %>%
        filter(date==max(date)) %>%
        ggplot(aes_string(x=segVar)) + 
        geom_histogram(fill="lightblue", bins=100) + 
        labs(x=paste0("Variable: ", segVar), 
             y=NULL, 
             title=paste0("# counties by ", segVar, " bucket by time period")
             ) + 
        facet_wrap(~date)
    print(p1)
    
    # Create buckets for plotting
    df <- df %>%
        filter(date==max(date)) %>%
        mutate(bucket=c(names(cutsLabels), "999. error high")[1+findInterval(get(segVar), unname(cutsLabels))])
    
    # Create county list
    tmpList <- lapply(c(names(cutsLabels), "999. error high"), 
                      FUN=function(x) filter(df, bucket==x) %>% pull(countyFIPS)
                      ) %>%
        purrr::set_names(c(names(cutsLabels), "999. error high"))
    
    # Create plot of metrics over time
    compareBurdenVaxCounty(lstBurden=lst, 
                           lstVax=lstVax, 
                           ctyIDs=tmpList, 
                           returnData=TRUE, 
                           p1Title=p1Title, 
                           p1SubTitle=p1SubTitle, 
                           p1ScaleLabel=p1ScaleLabel
                           )
    
}

plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="tcpm",
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 200k"=200000, "2. 200k-300k"=300000, "3. Over 300k"=500000)
                    )

## # A tibble: 13,014 x 7
##    mapName       date       name   value     pop     n mapName2            
##    <chr>         <date>     <chr>  <dbl>   <dbl> <int> <chr>               
##  1 1. Under 200k 2020-01-25 cpm7  0.221  5806894    37 1. Under 200k (n=37)
##  2 1. Under 200k 2020-01-25 dpm7  0      5806894    37 1. Under 200k (n=37)
##  3 1. Under 200k 2020-01-25 tcpm7 1.35   5806894    37 1. Under 200k (n=37)
##  4 1. Under 200k 2020-01-25 tdpm7 0      5806894    37 1. Under 200k (n=37)
##  5 1. Under 200k 2020-01-26 cpm7  0.0492 5806894    37 1. Under 200k (n=37)
##  6 1. Under 200k 2020-01-26 dpm7  0      5806894    37 1. Under 200k (n=37)
##  7 1. Under 200k 2020-01-26 tcpm7 1.40   5806894    37 1. Under 200k (n=37)
##  8 1. Under 200k 2020-01-26 tdpm7 0      5806894    37 1. Under 200k (n=37)
##  9 1. Under 200k 2020-01-27 cpm7  0.0492 5806894    37 1. Under 200k (n=37)
## 10 1. Under 200k 2020-01-27 dpm7  0      5806894    37 1. Under 200k (n=37)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="tdpm",
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 2000"=2000, "2. 2000-4000"=4000, "3. Over 4000"=10000)
                    )

## # A tibble: 13,014 x 7
##    mapName       date       name   value     pop     n mapName2            
##    <chr>         <date>     <chr>  <dbl>   <dbl> <int> <chr>               
##  1 1. Under 2000 2020-01-25 cpm7  0.142  9051032    57 1. Under 2000 (n=57)
##  2 1. Under 2000 2020-01-25 dpm7  0.0158 9051032    57 1. Under 2000 (n=57)
##  3 1. Under 2000 2020-01-25 tcpm7 0.868  9051032    57 1. Under 2000 (n=57)
##  4 1. Under 2000 2020-01-25 tdpm7 0.110  9051032    57 1. Under 2000 (n=57)
##  5 1. Under 2000 2020-01-26 cpm7  0.0316 9051032    57 1. Under 2000 (n=57)
##  6 1. Under 2000 2020-01-26 dpm7  0      9051032    57 1. Under 2000 (n=57)
##  7 1. Under 2000 2020-01-26 tcpm7 0.900  9051032    57 1. Under 2000 (n=57)
##  8 1. Under 2000 2020-01-26 tdpm7 0.110  9051032    57 1. Under 2000 (n=57)
##  9 1. Under 2000 2020-01-27 cpm7  0.0316 9051032    57 1. Under 2000 (n=57)
## 10 1. Under 2000 2020-01-27 dpm7  0      9051032    57 1. Under 2000 (n=57)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="vxcgte65pct",
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 80%"=80, "2. 80%-90%"=90, "3. Over 90%"=100)
                    )

## # A tibble: 13,014 x 7
##    mapName      date       name     value     pop     n mapName2           
##    <chr>        <date>     <chr>    <dbl>   <dbl> <int> <chr>              
##  1 1. Under 80% 2020-01-25 cpm7  3.12e- 1 6406205    41 1. Under 80% (n=41)
##  2 1. Under 80% 2020-01-25 dpm7  0        6406205    41 1. Under 80% (n=41)
##  3 1. Under 80% 2020-01-25 tcpm7 2.16e+ 0 6406205    41 1. Under 80% (n=41)
##  4 1. Under 80% 2020-01-25 tdpm7 0        6406205    41 1. Under 80% (n=41)
##  5 1. Under 80% 2020-01-26 cpm7  2.23e- 2 6406205    41 1. Under 80% (n=41)
##  6 1. Under 80% 2020-01-26 dpm7  0        6406205    41 1. Under 80% (n=41)
##  7 1. Under 80% 2020-01-26 tcpm7 2.19e+ 0 6406205    41 1. Under 80% (n=41)
##  8 1. Under 80% 2020-01-26 tdpm7 0        6406205    41 1. Under 80% (n=41)
##  9 1. Under 80% 2020-01-27 cpm7  1.82e-17 6406205    41 1. Under 80% (n=41)
## 10 1. Under 80% 2020-01-27 dpm7  0        6406205    41 1. Under 80% (n=41)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="tdpm",
                    segVar="d_tdpm_360",
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 1000"=1000, "2. 1000-2000"=2000, "3. Over 2000"=4000), 
                    dateRange=c(lubridate::days(0), lubridate::days(360)), 
                    p1ScaleLabel="Segmented By: DPM\n(most recent 360 days)", 
                    p1SubTitle=paste0("US counties of 100k-250k (excluding states with spiky data)\n")
                    )

## # A tibble: 13,014 x 7
##    mapName       date       name   value      pop     n mapName2            
##    <chr>         <date>     <chr>  <dbl>    <dbl> <int> <chr>               
##  1 1. Under 1000 2020-01-25 cpm7  0.128  13373433    83 1. Under 1000 (n=83)
##  2 1. Under 1000 2020-01-25 dpm7  0.0107 13373433    83 1. Under 1000 (n=83)
##  3 1. Under 1000 2020-01-25 tcpm7 0.812  13373433    83 1. Under 1000 (n=83)
##  4 1. Under 1000 2020-01-25 tdpm7 0.0748 13373433    83 1. Under 1000 (n=83)
##  5 1. Under 1000 2020-01-26 cpm7  0.0214 13373433    83 1. Under 1000 (n=83)
##  6 1. Under 1000 2020-01-26 dpm7  0      13373433    83 1. Under 1000 (n=83)
##  7 1. Under 1000 2020-01-26 tcpm7 0.833  13373433    83 1. Under 1000 (n=83)
##  8 1. Under 1000 2020-01-26 tdpm7 0.0748 13373433    83 1. Under 1000 (n=83)
##  9 1. Under 1000 2020-01-27 cpm7  0.0214 13373433    83 1. Under 1000 (n=83)
## 10 1. Under 1000 2020-01-27 dpm7  0      13373433    83 1. Under 1000 (n=83)
## # ... with 13,004 more rows
# Pull states from West South Central
wsc <- state.abb[state.division %in% c("West South Central", "East South Central")]

plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="tdpm",
                    segVar="d_tdpm_360",
                    lstFilter = list("state"=wsc),
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 1000"=1000, "2. 1000-2000"=2000, "3. Over 2000"=4000), 
                    dateRange=c(lubridate::days(0), lubridate::days(360)), 
                    p1ScaleLabel="Segmented By: DPM\n(most recent 360 days)", 
                    p1SubTitle=paste0("South Central counties of 100k-250k (excluding states with spiky data)\n")
                    )

## # A tibble: 13,014 x 7
##    mapName       date       name  value    pop     n mapName2           
##    <chr>         <date>     <chr> <dbl>  <dbl> <int> <chr>              
##  1 1. Under 1000 2020-01-25 cpm7      0 848810     4 1. Under 1000 (n=4)
##  2 1. Under 1000 2020-01-25 dpm7      0 848810     4 1. Under 1000 (n=4)
##  3 1. Under 1000 2020-01-25 tcpm7     0 848810     4 1. Under 1000 (n=4)
##  4 1. Under 1000 2020-01-25 tdpm7     0 848810     4 1. Under 1000 (n=4)
##  5 1. Under 1000 2020-01-26 cpm7      0 848810     4 1. Under 1000 (n=4)
##  6 1. Under 1000 2020-01-26 dpm7      0 848810     4 1. Under 1000 (n=4)
##  7 1. Under 1000 2020-01-26 tcpm7     0 848810     4 1. Under 1000 (n=4)
##  8 1. Under 1000 2020-01-26 tdpm7     0 848810     4 1. Under 1000 (n=4)
##  9 1. Under 1000 2020-01-27 cpm7      0 848810     4 1. Under 1000 (n=4)
## 10 1. Under 1000 2020-01-27 dpm7      0 848810     4 1. Under 1000 (n=4)
## # ... with 13,004 more rows
plotCountyEvolution(cty_newdata_20220507, 
                    lstVax=cty_vaxdata_20220508,
                    baseVar="vxcgte65pct",
                    lstFilter = list("state"=wsc),
                    lstExclude=list("state"=c("CO", "GA", "HI", "NM", "OH", "SD", "TX", "UT", "VA", "WV")),
                    popRange=c(100000, 250000), 
                    cutsLabels=c("1. Under 80%"=80, "2. 80%-90%"=90, "3. Over 90%"=100),
                    p1ScaleLabel="Segmented By: %vax\n(65+ population)", 
                    p1SubTitle=paste0("South Central counties of 100k-250k (excluding states with spiky data)\n")
                    )

## # A tibble: 13,014 x 7
##    mapName      date       name  value     pop     n mapName2           
##    <chr>        <date>     <chr> <dbl>   <dbl> <int> <chr>              
##  1 1. Under 80% 2020-01-25 cpm7      0 2567398    17 1. Under 80% (n=17)
##  2 1. Under 80% 2020-01-25 dpm7      0 2567398    17 1. Under 80% (n=17)
##  3 1. Under 80% 2020-01-25 tcpm7     0 2567398    17 1. Under 80% (n=17)
##  4 1. Under 80% 2020-01-25 tdpm7     0 2567398    17 1. Under 80% (n=17)
##  5 1. Under 80% 2020-01-26 cpm7      0 2567398    17 1. Under 80% (n=17)
##  6 1. Under 80% 2020-01-26 dpm7      0 2567398    17 1. Under 80% (n=17)
##  7 1. Under 80% 2020-01-26 tcpm7     0 2567398    17 1. Under 80% (n=17)
##  8 1. Under 80% 2020-01-26 tdpm7     0 2567398    17 1. Under 80% (n=17)
##  9 1. Under 80% 2020-01-27 cpm7      0 2567398    17 1. Under 80% (n=17)
## 10 1. Under 80% 2020-01-27 dpm7      0 2567398    17 1. Under 80% (n=17)
## # ... with 13,004 more rows

Further exploration is made of growth rates in key metrics:

# Create integrated data, using only select time periods
dfVaxBurden_20220507 <- cty_newdata_20220507$dfPerCapita %>% 
    inner_join(cty_vaxdata_20220508$vaxFix, by=c("countyFIPS"="FIPS", "state", "date")) %>% 
    filter(date %in% c(max(date)-lubridate::days(c(4, 369))))

# Summarize and create plot
dfPlotVaxBurden <- dfVaxBurden_20220507 %>% 
    filter(pop >= 25000) %>%
    mutate(minday=ifelse(date==min(date), 1, 0), maxday=ifelse(date==max(date), 1, 0)) %>%
    group_by(countyFIPS) %>% 
    filter(n()==2) %>%
    summarize(across(c(tcpm7, tdpm7, cpm7, dpm7), 
                     .fns=function(x) sum(x*maxday)/sum(x*minday)-1, 
                     .names="g_{.col}"
                     ), 
              pop=median(pop), 
              mu_vxcpoppct=mean(vxcpoppct)
              ) %>% 
    pivot_longer(-c(countyFIPS, pop, mu_vxcpoppct)) %>%
    filter(value >= -1, 
           value <= 10
           ) 

dfPlotVaxBurden %>% 
    ggplot(aes(x=mu_vxcpoppct, y=value)) + 
    geom_point(aes(size=pop), alpha=0.25) + 
    geom_smooth(aes(weight=pop), method="lm") + 
    geom_hline(yintercept=c(-1, 0, 1), lty=2, color="red") +
    labs(y="Growth rate from 1 year ago", 
         x="% population vaccinated", 
         title="Association of 1-year change in burden with vaccination", 
         subtitle="Linear model is population weighted for counties >25k population with growth between -1 and 10"
         ) +
    facet_wrap(~name)
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (stat_smooth).
## Warning: Removed 16 rows containing missing values (geom_point).

dfPlotVaxBurden %>%
    lm(value ~ mu_vxcpoppct:name + name + 0, data=., weights=pop) %>%
    summary()
## 
## Call:
## lm(formula = value ~ mu_vxcpoppct:name + name + 0, data = ., 
##     weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -3468.8  -142.7   -12.7    97.9 16322.7 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## nameg_cpm7               -3.673348   0.156450 -23.479  < 2e-16 ***
## nameg_dpm7                0.426972   0.175463   2.433  0.01499 *  
## nameg_tcpm7               1.205051   0.151815   7.938 2.47e-15 ***
## nameg_tdpm7               1.577777   0.151850  10.390  < 2e-16 ***
## mu_vxcpoppct:nameg_cpm7   0.098047   0.003404  28.801  < 2e-16 ***
## mu_vxcpoppct:nameg_dpm7  -0.011104   0.003751  -2.960  0.00309 ** 
## mu_vxcpoppct:nameg_tcpm7  0.008036   0.003293   2.440  0.01471 *  
## mu_vxcpoppct:nameg_tdpm7 -0.019031   0.003294  -5.777 8.03e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 556.3 on 5639 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.4357, Adjusted R-squared:  0.4349 
## F-statistic: 544.2 on 8 and 5639 DF,  p-value: < 2.2e-16

The process is converted to functional form:

# 1. Create integrated data
integrateCountyBurdenVax <- function(lstBurden, lstVax, keyDates=NULL) {
    
    # FUNCTION ARGUMENTS:
    # lstBurden: processed county-level burden list containing dfPerCapita
    # lstVax: processed county-level vaccines list containing vaxFix
    # keyDates: character vector of integers for days to subtract from max(date) and include
    #           NULL means include all
    
    # Convert keyDates to useDates appropriately
    useDates <- sort(intersect(unique(lstBurden[["dfPerCapita"]]$date), unique(lstVax[["vaxFix"]]$date)))
    useDates <- as.Date(useDates, origin="1970-01-01")
    if(!is.null(keyDates)) useDates <- useDates[useDates %in% (max(useDates)-lubridate::days(keyDates))]
    
    # Create integrated burden data, using requested time periods
    lstBurden[["dfPerCapita"]] %>% 
        filter(date %in% useDates) %>%
        inner_join(lstVax[["vaxFix"]], by=c("countyFIPS"="FIPS", "state", "date"))
    
}


# 2. Create integrated county plot data
makeIntegratedCountyPlotData <- function(df, 
                                         burdenVars=c("tcpm7", "tdpm7", "cpm7", "dpm7"), 
                                         vaxVars=c("vxcpoppct", "vxcgte18pct", "vxcgte65pct"), 
                                         makeBurdenGrowth=TRUE,
                                         fnBurden=mean,
                                         makeVaxGrowth=FALSE,
                                         fnVax=mean
                                         ) {

    # FUNCTION ARGUMENTS:
    # df: processed data frame with integrated burden and vaccines
    # burdenVars: burden variables to be summarized
    # vaxVars: vaccine variables to be summarized
    # makeBurdenGrowth: boolean, should burden be calculated as latest minus earliest?
    # fnBurden: function for summarizing burden fields (used only if makeBurdenGrowth is FALSE)
    # makeVaxGrowth: boolean, should vaccines be calculated as latest minus earliest?
    # fnVax: function for summarizing vaccine fields (used only if makeBurdenGrowth is FALSE)
    
    # Create the prefixes for the burden and vaccines variables
    burdenPrefix <- ifelse(isTRUE(makeBurdenGrowth), "g", deparse(substitute(fnBurden)))
    vaxPrefix <- ifelse(isTRUE(makeVaxGrowth), "g", deparse(substitute(fnVax)))
    
    # Create delta variables if requested
    df %>% 
        mutate(minday=ifelse(date==min(date), 1, 0), maxday=ifelse(date==max(date), 1, 0)) %>%
        group_by(countyFIPS) %>% 
        summarize(across(all_of(burdenVars), 
                         .fns=if(isTRUE(makeBurdenGrowth)) function(x) sum(x*maxday)/sum(x*minday)-1 else fnBurden, 
                         .names=paste0(burdenPrefix, "_{.col}")
                         ), 
                  pop=median(pop), 
                  across(all_of(vaxVars), 
                         .fns=if(isTRUE(makeVaxGrowth)) function(x) sum(x*maxday)/sum(x*minday)-1 else fnVax, 
                         .names=paste0(vaxPrefix, "_{.col}")
                         ),
                  hasMaxMin=(sum(maxday)==1 & sum(minday)==1)
                  ) %>% 
        pivot_longer(-c(countyFIPS))
    
}

#3. Create plots
plotIntegratedCounty <- function(df, 
                                 demVars=c("countyFIPS", "pop", "hasMaxMin"), 
                                 vaxVar="mean_vxcpoppct", 
                                 burdenVars=c("g_tcpm7", "g_tdpm7", "g_cpm7", "g_dpm7"), 
                                 checkMaxMin=TRUE, 
                                 popRange=c(1, +Inf), 
                                 valueRange=c(-1, 10),
                                 xLabel=NULL,
                                 yLabel="Growth rate from 1 year ago",
                                 plotTitle="Association of 1-year change in burden with vaccination",
                                 plotSubtitle=NULL,
                                 returnData=TRUE
                                 ) {

    # FUNCTION ARGUMENTS:
    # df: data frame from makeIntegratedCountyPlotData()
    # demVars: variables for demographics
    # vaxVar: x-axis variable (vaccine)
    # burdenVars: variables on the facetted y-axes (burden)
    # checkMaxMin: boolean, keep only counties with data on the max and min dates (growth meaningless otherwise)
    # popRange: acceptable population range for data
    # valueRange: range of values to keep for plotting metrics
    # xLabel: label for the x-axis (NULL means create from parameters)
    # yLabel: label for the y-axis
    # plotTitle: title for the plot
    # plotSubtitle: subtitle for the plot
    # returnData: boolean, should data (df) be returned?
    
    # Create labels where needed
    if(is.null(xLabel)) {
        xLabel <- paste0("% population ", 
                         if(vaxVar=="mean_vxcgte18pct") "18+) " else if(vaxVar=="mean_vxcgte65pct") "65+) " else "", 
                         "vaccinated"
                         )
    }
    if(is.null(plotSubtitle)) {
        if(min(popRange)<=1 & max(popRange)==+Inf) popLab <- " of any "
        else if(min(popRange)<=1 & max(popRange)<+Inf) popLab <- paste0(" <", round(max(popRange)/1000, 1), "k ")
        else if(min(popRange)>1 & max(popRange)==Inf) popLab <- paste0(" >", round(min(popRange)/1000, 1), "k ")
        else popLab <- paste0(" between ", round(min(popRange)/1000, 1), "k and ", round(max(popRange)/1000, 1), "k ")
        if(min(valueRange)==-Inf & max(valueRange)==+Inf) valueLab <- " of any value"
        else if(min(valueRange)==-Inf & max(valueRange)<+Inf) valueLab <- paste0(" <", max(valueRange))
        else if(min(valueRange)>-Inf & max(valueRange)==Inf) valueLab <- paste0(" >", min(valueRange))
        else valueLab <- paste0(" between ", min(valueRange), " and ", max(valueRange))
        plotSubtitle <- paste0("Linear model is population weighted for counties", 
                               popLab, 
                               "population with y-axis", 
                               valueLab
                               )
    }
    # Create the data frame
    df <- df %>% 
        filter(name %in% c(all_of(demVars), all_of(vaxVar), all_of(burdenVars))) %>%
        pivot_wider(c(countyFIPS)) %>%
        filter(if(isTRUE(checkMaxMin)) hasMaxMin==1 else TRUE, 
               pop>=popRange[1], 
               pop<=popRange[2]
               ) %>%
        pivot_longer(-c(all_of(demVars), all_of(vaxVar))) %>%
        filter(value >= valueRange[1], value <= valueRange[2])
    
    # Create and print the plot
    p1 <- df %>%
        ggplot(aes_string(x=vaxVar, y="value")) + 
        geom_point(aes(size=pop), alpha=0.25) + 
        geom_smooth(aes(weight=pop), method="lm") + 
        geom_hline(yintercept=c(-1, 0, 1), lty=2, color="red") +
        labs(y=yLabel, 
             x=xLabel, 
             title=plotTitle, 
             subtitle=plotSubtitle
             ) +
        facet_wrap(~name)
    print(p1)

    if(isTRUE(returnData)) return(df)
    
}

#4. Create regressions
regIntegratedCounty <- function(df, 
                                vaxVar="mean_vxcpoppct", 
                                returnData=FALSE
                                ) {
    
    # FUNCTION ARGUMENTS:
    # df: data frame from lotIntegratedCounty
    # vaxVar: x-axis variable (vaccine)
    # returnData: boolean, should data (df) be returned?
    
    df %>%
        lm(value ~ get(vaxVar):name + name + 0, data=., weights=pop) %>%
        summary() %>%
        print()
    
    if(isTRUE(returnData)) return(df)
    
}

# Example for all dates and for select dates
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508)
## # A tibble: 1,599,278 x 22
##    countyFIPS state date       cases new_cases deaths new_deaths   tcpm   cpm
##    <chr>      <chr> <date>     <dbl>     <dbl>  <dbl>      <dbl>  <dbl> <dbl>
##  1 01001      AL    2020-12-13  3233         0     41          0 57868.     0
##  2 01003      AL    2020-12-13 10489         0    141          0 46987.     0
##  3 01005      AL    2020-12-13  1264         0     30          0 51203.     0
##  4 01007      AL    2020-12-13  1398         0     39          0 62427.     0
##  5 01009      AL    2020-12-13  3663         0     47          0 63345.     0
##  6 01011      AL    2020-12-13   723         0     20          0 71577.     0
##  7 01013      AL    2020-12-13  1289         0     44          0 66279.     0
##  8 01015      AL    2020-12-13  7658         0    129          0 67409.     0
##  9 01017      AL    2020-12-13  1982         0     55          0 59602.     0
## 10 01019      AL    2020-12-13  1167         0     24          0 44549.     0
## # ... with 1,599,268 more rows, and 13 more variables: tdpm <dbl>, dpm <dbl>,
## #   tcpm7 <dbl>, cpm7 <dbl>, tdpm7 <dbl>, dpm7 <dbl>, countyName <chr>,
## #   vxcpoppct <dbl>, vxcgte18pct <dbl>, vxcgte65pct <dbl>, pop <dbl>,
## #   popgte18 <dbl>, popgte65 <dbl>
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369))
## # A tibble: 6,284 x 22
##    countyFIPS state date       cases new_cases deaths new_deaths    tcpm   cpm
##    <chr>      <chr> <date>     <dbl>     <dbl>  <dbl>      <dbl>   <dbl> <dbl>
##  1 01001      AL    2021-05-01  6907         3    107          0 123628.  53.7
##  2 01003      AL    2021-05-01 20966        25    306          1  93919. 112. 
##  3 01005      AL    2021-05-01  2302         2     56          0  93251.  81.0
##  4 01007      AL    2021-05-01  2596         2     63          0 115924.  89.3
##  5 01009      AL    2021-05-01  6616         3    135          0 114412.  51.9
##  6 01011      AL    2021-05-01  1230         2     41          1 121770. 198. 
##  7 01013      AL    2021-05-01  2154         2     69          0 110757. 103. 
##  8 01015      AL    2021-05-01 14447         9    312          0 127169.  79.2
##  9 01017      AL    2021-05-01  3544         3    123          1 106574.  90.2
## 10 01019      AL    2021-05-01  1837         1     45          0  70125.  38.2
## # ... with 6,274 more rows, and 13 more variables: tdpm <dbl>, dpm <dbl>,
## #   tcpm7 <dbl>, cpm7 <dbl>, tdpm7 <dbl>, dpm7 <dbl>, countyName <chr>,
## #   vxcpoppct <dbl>, vxcgte18pct <dbl>, vxcgte65pct <dbl>, pop <dbl>,
## #   popgte18 <dbl>, popgte65 <dbl>
# Example for integrated plot data
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
    makeIntegratedCountyPlotData()
## # A tibble: 28,278 x 3
##    countyFIPS name                 value
##    <chr>      <chr>                <dbl>
##  1 01001      g_tcpm7              1.29 
##  2 01001      g_tdpm7              1.02 
##  3 01001      g_cpm7              -0.143
##  4 01001      g_dpm7               0.4  
##  5 01001      pop              55869    
##  6 01001      mean_vxcpoppct      30.4  
##  7 01001      mean_vxcgte18pct    36.9  
##  8 01001      mean_vxcgte65pct    57.3  
##  9 01001      hasMaxMin            1    
## 10 01003      g_tcpm7              1.66 
## # ... with 28,268 more rows
# Check that data are the same
dfCheck1 <- integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
    makeIntegratedCountyPlotData() %>%
    pivot_wider(countyFIPS) %>%
    filter(pop >= 25000, hasMaxMin==1)
dfCheck2 <- dfPlotVaxBurden %>% 
    pivot_wider(c(countyFIPS, pop, mu_vxcpoppct)) %>%
    rename(mean_vxcpoppct=mu_vxcpoppct)

# Confirm same fields
setdiff(names(dfCheck1), names(dfCheck2))
## [1] "mean_vxcgte18pct" "mean_vxcgte65pct" "hasMaxMin"
setdiff(names(dfCheck2), names(dfCheck1))
## character(0)
# Confirm same counties
setdiff(dfCheck1$countyFIPS, dfCheck2$countyFIPS)
## character(0)
# Confirm same value, except for NA
dfCheck <- dfCheck1 %>% 
    pivot_longer(-c(countyFIPS)) %>%
    inner_join(dfCheck2 %>% pivot_longer(-c(countyFIPS)), by=c("countyFIPS", "name")) %>%
    mutate(diff=(is.na(value.x) != is.na(value.y)) | (!is.na(value.x) & !is.na(value.y) & value.x != value.y))
# Summary of differences by field
dfCheck %>% count(name, diff) %>% pivot_wider(name, names_from="diff", values_from="n")
## # A tibble: 6 x 3
##   name           `FALSE` `TRUE`
##   <chr>            <int>  <int>
## 1 g_cpm7            1474    130
## 2 g_dpm7            1158    446
## 3 g_tcpm7           1603      1
## 4 g_tdpm7           1600      4
## 5 mean_vxcpoppct    1604     NA
## 6 pop               1604     NA
# Check that all are where the previous filtering rules produced NA for value.y
dfCheck %>% filter(diff) %>% is.na %>% colSums()
## countyFIPS       name    value.x    value.y       diff 
##          0          0          0        581          0
# Check values for mismatches
dfCheck %>% filter(diff) %>% mutate(exceeds=(value.x > 10 | value.x < -1)) %>% count(name, exceeds)
## # A tibble: 4 x 3
##   name    exceeds     n
##   <chr>   <lgl>   <int>
## 1 g_cpm7  TRUE      130
## 2 g_dpm7  TRUE      446
## 3 g_tcpm7 TRUE        1
## 4 g_tdpm7 TRUE        4
# Example for integrated plot and regression
integrateCountyBurdenVax(cty_newdata_20220507, cty_vaxdata_20220508, keyDates=c(4, 369)) %>%
    makeIntegratedCountyPlotData() %>%
    plotIntegratedCounty(popRange=c(25000, +Inf)) %>%
    regIntegratedCounty()
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 16 rows containing non-finite values (stat_smooth).
## Warning: Removed 16 rows containing missing values (geom_point).

## 
## Call:
## lm(formula = value ~ get(vaxVar):name + name + 0, data = ., weights = pop)
## 
## Weighted Residuals:
##     Min      1Q  Median      3Q     Max 
## -3468.8  -142.7   -12.7    97.9 16322.7 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## nameg_cpm7              -3.673348   0.156450 -23.479  < 2e-16 ***
## nameg_dpm7               0.426972   0.175463   2.433  0.01499 *  
## nameg_tcpm7              1.205051   0.151815   7.938 2.47e-15 ***
## nameg_tdpm7              1.577777   0.151850  10.390  < 2e-16 ***
## get(vaxVar):nameg_cpm7   0.098047   0.003404  28.801  < 2e-16 ***
## get(vaxVar):nameg_dpm7  -0.011104   0.003751  -2.960  0.00309 ** 
## get(vaxVar):nameg_tcpm7  0.008036   0.003293   2.440  0.01471 *  
## get(vaxVar):nameg_tdpm7 -0.019031   0.003294  -5.777 8.03e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 556.3 on 5639 degrees of freedom
##   (16 observations deleted due to missingness)
## Multiple R-squared:  0.4357, Adjusted R-squared:  0.4349 
## F-statistic: 544.2 on 8 and 5639 DF,  p-value: < 2.2e-16